Add documentation for expectations and standards related to model formats/configs #24

markurtz · 2025-05-20T22:01:01Z

Summary

Documentation and examples added to standardize the model serialization formats, intended compatibility, and deserialization formats.

Testing

Self review, PR reviews

…mats/configs

Copilot

Copilot reviewed 1 out of 1 changed files in this pull request and generated no comments.

nm-red-hat-upstream-automation-bot · 2025-05-20T22:01:54Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/speculators/actions/runs/15148722953/artifacts/3163954010.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-05-20T22:07:49Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/speculators/actions/runs/15148815986/artifacts/3163986401.
They will be retained for up to 30 days.

nm-red-hat-upstream-automation-bot · 2025-05-20T22:23:54Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/speculators/actions/runs/15149050233/artifacts/3164069717.
They will be retained for up to 30 days.

docs/architecture/model_formats.md

nm-red-hat-upstream-automation-bot · 2025-05-21T18:54:37Z

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/speculators/actions/runs/15170276474/artifacts/3171360795.
They will be retained for up to 30 days.

markmc · 2025-05-27T16:32:00Z

docs/architecture/model_formats.md

+                "depth": 5
+            }
+        ],
+        "default_proposal_method": "tree",


Just using this EAGLE example to talk this through ...

Currently, vLLM only supports greedy sampling with EAGLE. See https://github.com/vllm-project/vllm/blob/58738772410c5e0d60b61db39538a9b313d2d7ad/vllm/v1/spec_decode/eagle.py#L182

The discussion in vllm-project/vllm#16899 suggests that if vLLM does support random sampling, that it should follow the sampling params of the target model in order to have the best chance of matching its distribution

Tree decoding is proposed in vllm-project/vllm#17560 - tree depth and num_spec_expand are configurable via --speculative-config

So ...

What does it mean for the creator of a speculator model to list proposal methods like this?

Is this a recommendation that tree decoding with those parameters is optimal?

Might there be multiple tree decoding "profiles" with different values? How would a user choose between them?

What if the inference engine does not support that proposal method? Can they fall back to greedy?

What does it mean to specify draft_tokens = 5? Is this also a recommendation? It should be used unless the user specifies their own value?

HTH

markmc · 2025-05-27T16:37:58Z

docs/architecture/model_formats.md

+        "default_proposal_method": "tree",
+        "verifier": {
+            "name_or_path": "meta-llama/Llama-3.1-8B-Instruct",
+            "architectures": ["LlamaForCausalLM"],


Thinking through the implications of this ...

Is this supposed to override the verifier model config (assuming they don't match?)

What would the use case be to override?

I'd imagine it would be quite tricky to make this override work in vLLM - by the time we load this config, we've probably done some setup of the verifier based on its config

If it's not an override, does vLLM just ignore it? That would be a bit weird ...

markmc · 2025-05-27T16:45:28Z

docs/architecture/model_formats.md

+- `"eagle"`, `"eagle_2"`, `"eagle_3"` - Eagle speculator variants based on Transformer architecture for the draft model
+- `"hass"` - Similar to Eagle based on the Transformer architecture for the draft model
+- `"mlp_speculator"` - Based on a multi-layer perceptron (MLP) architecture for the draft model
+- `"specdec"` - An independent speculator model


It would be super convenient if we could match the existing names used in vLLM

+ALGORITHMS = [ + "eagle", + "eagle3", + "medusa", + "mlp_speculator", + "deepseek_mtp",

markurtz added 2 commits May 20, 2025 21:51

Add documentation for expectations and standards related to model for…

421edf1

…mats/configs

updates for grammar and wording

5859cd1

markurtz requested a review from Copilot May 20, 2025 22:01

Copilot AI reviewed May 20, 2025

View reviewed changes

markurtz requested review from rahul-tuli and MeganEFlynn May 20, 2025 22:01

minor fixes for json samples spacing

3bb1fe1

Add related resources at the end of the doc

9d87675

markurtz commented May 21, 2025

View reviewed changes

docs/architecture/model_formats.md Outdated Show resolved Hide resolved

markurtz commented May 21, 2025

View reviewed changes

docs/architecture/model_formats.md Outdated Show resolved Hide resolved

markurtz commented May 21, 2025

View reviewed changes

docs/architecture/model_formats.md Outdated Show resolved Hide resolved

markurtz commented May 21, 2025

View reviewed changes

docs/architecture/model_formats.md Show resolved Hide resolved

Update configs and examples with expected arguments and values

287c103

markmc reviewed May 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add documentation for expectations and standards related to model formats/configs #24

Add documentation for expectations and standards related to model formats/configs #24

Uh oh!

markurtz commented May 20, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

nm-red-hat-upstream-automation-bot bot commented May 20, 2025

Uh oh!

nm-red-hat-upstream-automation-bot bot commented May 20, 2025

Uh oh!

nm-red-hat-upstream-automation-bot bot commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nm-red-hat-upstream-automation-bot bot commented May 21, 2025

Uh oh!

markmc May 27, 2025

Uh oh!

markmc May 27, 2025

Uh oh!

markmc May 27, 2025

Uh oh!

Uh oh!

Add documentation for expectations and standards related to model formats/configs #24

Are you sure you want to change the base?

Add documentation for expectations and standards related to model formats/configs #24

Uh oh!

Conversation

markurtz commented May 20, 2025

Summary

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

nm-red-hat-upstream-automation-bot bot commented May 20, 2025

Uh oh!

nm-red-hat-upstream-automation-bot bot commented May 20, 2025

Uh oh!

nm-red-hat-upstream-automation-bot bot commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nm-red-hat-upstream-automation-bot bot commented May 21, 2025

Uh oh!

markmc May 27, 2025

Choose a reason for hiding this comment

Uh oh!

markmc May 27, 2025

Choose a reason for hiding this comment

Uh oh!

markmc May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!