YaRN tests #1161

viktor-ferenczi · 2023-09-23T17:34:23Z

Issue: #980

YaRN paper: https://arxiv.org/abs/2309.00071
YaRN repository: https://github.com/jquesnelle/yarn
Smallest model to test with: https://huggingface.co/NousResearch/Yarn-Llama-2-7b-64k

Currently the branch has preliminary code to test the context window quality with pass key retrieval tasks. It does not plot a graph, that's not the goal of the test. It allows for running the same test on both the reference implementation of YaRN using Transformers and vLLM using parameters to provide similar output, therefore allows for comparing our upcoming implementation with the reference one.

TODO

Implement YaRN in vLLM
Verify model output and context window quality
Turn the test code into a pytest test case

casper-hansen · 2023-09-24T08:36:35Z

Thanks for giving this a shot @viktor-ferenczi. YaRN models look impressive because of their low perplexity and long contest windows, so I’m sure the community will love to test this out once it’s ready.

trannhatquy · 2023-09-25T09:01:43Z

please finish this pull request, it will really help because this model is very good

viktor-ferenczi · 2023-09-26T08:07:23Z

I don't have extensive LLM (or vLLM) development experience yet. I'm learning into it on the job here, so it won't be implemented fast (unless I get help on this). I'm dedicated to complete it at some point, but also need to find the time working on this. (I have a day job.)

trannhatquy · 2023-09-26T08:36:18Z

@zhuohan123 @WoosukKwon please see this pull request

viktor-ferenczi · 2023-09-26T14:41:27Z

@casper-hansen There is #555 and #464. They seem to share code with YaRN, just different RoPE scaling approaches.

I suggest to have a unified configuration and partially shared implementation. #555 would be a good start if that can be reviewed and finalized first. It is meaningless to redo the work which has already been done there. That PR has different test cases for the long context as what I wrote, so maybe they could be merged to use both.

The new LLM option would look something like:

rope_scaling=dict(
    type='linear',  # linear, dynamic or yarn
    factor=2.0,  # scaling factor
    ...  # Hyper-parameters if required, like YaRN's alpha and beta
)

Also, it seems to be defined in the YaRN models, except of the alpha and beta hyper-parameters. The paper mentions alpha=1 and beta=32 for Llama 2 models.

In the NousResearch/Yarn-Llama-2-13b-128k YaRN model's config.json there is:

  "rope_scaling": {
    "factor": 32.0,
    "original_max_position_embeddings": 4096,
    "type": "yarn",
    "finetuned": true
  }

The smaller NousResearch/Yarn-Llama-2-7b-64k YaRN model has:

  "rope_scaling": {
    "factor": 16.0,
    "original_max_position_embeddings": 4096,
    "type": "yarn",
    "finetuned": true
  }

What do you think?

Yard1 · 2023-10-04T18:01:09Z

Hi @viktor-ferenczi, I would be willing to contribute the implementation, unless you have already started work on this.

viktor-ferenczi · 2023-10-04T18:20:56Z

I just added tests and haven't written the actual YaRN code yet.

What may help you is that #464 was merged recently.

Please go ahead with the implementation, because I lack the time to work on it right now.

Yard1 · 2023-10-05T06:24:06Z

Implementation PR: #1264

JIRA: https://jira.habana-labs.com/browse/SW-227174 cherry-pick vllm-project#1030 and fixed conflicts after rebase Dependency: HabanaAI/vllm-hpu-extension#161 Verified with below 3 methods: 1. test with deepseek-v2 BF16 weight. => Passed 2. evaluate acc on deepseek-r1 with out of box block fp8 weight => Passed 3. evaluate acc on deepseek-r1 with out of box block fp8 weight + INC calibrated per-channel scale => Passed acc check, performance reach goal(number is in jira ticket) == Details == 1. test with deepseek-v2 BF16 weight: ``` PT_HPU_LAZY_MODE=1 python run_example_tp.py --model DeepSeek-V2-Lite --tokenizer DeepSeek-V2-Lite --osl 32 ``` ``` (VllmWorkerProcess pid=1039) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! (VllmWorkerProcess pid=1038) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! (VllmWorkerProcess pid=1041) WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! WARNING 04-25 03:01:53 [hpu_model_runner.py:1039] Configuration: ('decode', 4, 128) was not warmed-up! Processed prompts: 100%|████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00, 1.57it/s, est. speed input: 12.59 toks/s, output: 50.37 toks/s] e2e took 2.5509743690199684 seconds ==================================== Prompt: 'Hello, my name is' Generated text: '\nI am a 20 year old student from the UK. I am currently studying for a degree in English Literature and Creative Writing at the University of East' Ground truth: None ==================================== ==================================== Prompt: '0.999 compares to 0.9 is ' Generated text: '100%\n0.9999999999999999999999999' Ground truth: None ==================================== ==================================== Prompt: 'The capital of France is' Generated text: ' Paris, which is also the largest city in the country. The city is located on the Seine River and is known for its beautiful architecture, museums, and art' Ground truth: None ==================================== ==================================== Prompt: 'The future of AI is' Generated text: ' in the hands of the people\nThe future of AI is in the hands of the people\nThe future of AI is in the hands of the people\nThe' Ground truth: None ==================================== ``` 2. evaluate acc on deepseek-r1 with out of box block fp8 weight - limit 256 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9648|± |0.0115| | | |strict-match | 5|exact_match|↑ |0.9648|± |0.0115| 3. evaluate acc on deepseek-r1 with out of box block fp8 weight + INC calibrated per-channel scale |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.9688|± |0.0109| | | |strict-match | 5|exact_match|↑ |0.9688|± |0.0109| --------- Signed-off-by: Chendi.Xue <[email protected]> Signed-off-by: kwisniewski98 <[email protected]> Signed-off-by: Chendi Xue <[email protected]> Signed-off-by: Yi Liu <[email protected]> Co-authored-by: kwisniewski98 <[email protected]> Co-authored-by: Youlei Yang <[email protected]> Co-authored-by: Yi Liu <[email protected]> Co-authored-by: Yi Liu <[email protected]>

viktor-ferenczi force-pushed the yarn branch 2 times, most recently from f74f49d to 63a52f6 Compare September 23, 2023 17:39

viktor-ferenczi marked this pull request as draft September 24, 2023 01:57

viktor-ferenczi mentioned this pull request Sep 26, 2023

RoPE scaling support? #464

Closed

viktor-ferenczi force-pushed the yarn branch from 1fe53b2 to ca3800e Compare September 26, 2023 13:33

WoosukKwon added the new-model Requests to new models label Sep 27, 2023

viktor-ferenczi added 2 commits September 30, 2023 09:54

Key pass retrieval test to verify YaRN

21fdd8e

Prepared the test to run vLLM with Llama-2 7B YaRN model

a7595e1

viktor-ferenczi force-pushed the yarn branch from ca3800e to a7595e1 Compare September 30, 2023 07:54

Yard1 mentioned this pull request Oct 5, 2023

YaRN support implementation #1264

Merged

viktor-ferenczi changed the title ~~Support YaRN~~ YaRN tests Oct 5, 2023

WoosukKwon mentioned this pull request Oct 13, 2023

[v0.2.1] Release Tracker #1346

Closed

3 tasks

WoosukKwon mentioned this pull request Nov 2, 2023

[v0.2.2] Release Tracker #1551

Closed

3 tasks

zhuohan123 closed this Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

YaRN tests #1161

YaRN tests #1161

Uh oh!

viktor-ferenczi commented Sep 23, 2023 •

edited

Loading

Uh oh!

casper-hansen commented Sep 24, 2023

Uh oh!

trannhatquy commented Sep 25, 2023

Uh oh!

viktor-ferenczi commented Sep 26, 2023

Uh oh!

trannhatquy commented Sep 26, 2023

Uh oh!

viktor-ferenczi commented Sep 26, 2023 •

edited

Loading

Uh oh!

Yard1 commented Oct 4, 2023

Uh oh!

viktor-ferenczi commented Oct 4, 2023

Uh oh!

Yard1 commented Oct 5, 2023

Uh oh!

Uh oh!

Uh oh!

YaRN tests #1161

YaRN tests #1161

Uh oh!

Conversation

viktor-ferenczi commented Sep 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casper-hansen commented Sep 24, 2023

Uh oh!

trannhatquy commented Sep 25, 2023

Uh oh!

viktor-ferenczi commented Sep 26, 2023

Uh oh!

trannhatquy commented Sep 26, 2023

Uh oh!

viktor-ferenczi commented Sep 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Yard1 commented Oct 4, 2023

Uh oh!

viktor-ferenczi commented Oct 4, 2023

Uh oh!

Yard1 commented Oct 5, 2023

Uh oh!

Uh oh!

viktor-ferenczi commented Sep 23, 2023 •

edited

Loading

viktor-ferenczi commented Sep 26, 2023 •

edited

Loading