RoPE scaling support? #464

nivibilla · 2023-07-14T15:51:59Z

HF Merged RoPE scaling into their library. This allows to increase context length by 4x without retraining.

WoosukKwon · 2023-07-14T21:50:05Z

Hi @nivibilla, thanks for letting us know the integration! We are interested in the RoPe scaling. Will investigate it.

nivibilla · 2023-07-14T21:52:16Z

Thank you! Looking forward to it.

lucasjinreal · 2023-07-17T05:14:01Z

@WoosukKwon Same request here, another issue: #479

NTK could performance much more better then PI. It make vllm able to inference more than 16k without users finetune their models.

nivibilla · 2023-07-17T07:09:41Z

@lucasjinreal yes that's right. And I've seen your issue. Can you make a PR? I'd be happy to test it out!

lucasjinreal · 2023-07-17T07:12:23Z

@nivibilla I have pasted main modification in that issue. Please make some adoption to your vllm base. (my base has messy codes pr would be cubersome), feel free to ask me any question if you got a chance to try.

nivibilla · 2023-07-17T07:14:29Z

@lucasjinreal Sure, no problem. I will try it out

kir152 · 2023-09-21T14:31:34Z

hey any update on this?

viktor-ferenczi · 2023-09-26T10:52:42Z

I will give it a try, because I have to learn into positional encoding anyway due to my planned work on #1161

For my reference, this former PR shows where the RoPE code is: #1004

viktor-ferenczi · 2023-09-26T14:29:43Z

Rope scaling seems to be added already in #555

I'm not sure how/whether to proceed here.

WoosukKwon · 2023-09-27T10:37:05Z

Implemented by #555

Current implementation of optimized topp/topk calculations for scalar case is handling the duplicates that are outside of kth border. Unfortunately, to analyze duplicates it is necessary to make a synchronization with CPU, what makes multi-step scheduling useless together with topp/topk. This PR adds option to skip duplicates handling with `VLLM_HANDLE_TOPK_DUPLICATES` (default `True`). When this variable is set, handling duplicates will be skipped and we will avoid synchronization with CPU. It also removes the synchronization which was done earlier in Sampler, by saving scalar value of `top_k` and `top_p`. It should give performance gain for all benchmarks with these sampling parameters, especially together with multi-step scheduling. While disabling the duplicates handling may cause small accuracy differences, the best solution will be to handle duplicates without synchronization with CPU. However, this is not a trivial problem, so I will try to provide such solution later.

hahmad2008 · 2024-11-26T07:11:15Z

@WoosukKwon @youkaichao @nivibilla How it gonna extend the context length? can you elaborate? if i need to extend length of llama3.1-8b from 8k to 128k, can I do that? how? by only using these args? --rope-scaling and --rope-theta
how to configure them?

* Fixing the shape to use in padding calculation * Assertion on the int8 quantized moe * Properly testing for padding

WoosukKwon added the feature request New feature or request label Jul 14, 2023

nivibilla closed this as completed Jul 17, 2023

nivibilla reopened this Jul 17, 2023

zhuohan123 mentioned this issue Jul 18, 2023

[Roadmap] vLLM Development Roadmap: H2 2023 #244

Closed

76 tasks

LiuXiaoxuanPKU mentioned this issue Jul 24, 2023

Support Longchat #555

Merged

pseudotensor mentioned this issue Aug 28, 2023

long context h2oai/h2ogpt#360

Open

AdamCLarsen mentioned this issue Sep 3, 2023

Vicuna-13b-16k with vllm not repeats a single word in output lm-sys/FastChat#2330

Open

viktor-ferenczi mentioned this issue Sep 26, 2023

YaRN tests #1161

Closed

WoosukKwon closed this as completed Sep 27, 2023

scarecr0w12 pushed a commit to scarecr0w12/vllm that referenced this issue May 27, 2025

Fixing the shape to use in padding calculation (vllm-project#464)

3ee6551

* Fixing the shape to use in padding calculation * Assertion on the int8 quantized moe * Properly testing for padding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RoPE scaling support? #464

RoPE scaling support? #464

nivibilla commented Jul 14, 2023

WoosukKwon commented Jul 14, 2023

Uh oh!

nivibilla commented Jul 14, 2023

Uh oh!

lucasjinreal commented Jul 17, 2023

Uh oh!

nivibilla commented Jul 17, 2023

Uh oh!

lucasjinreal commented Jul 17, 2023

Uh oh!

nivibilla commented Jul 17, 2023

Uh oh!

kir152 commented Sep 21, 2023

Uh oh!

viktor-ferenczi commented Sep 26, 2023 •

edited

Loading

Uh oh!

viktor-ferenczi commented Sep 26, 2023

Uh oh!

WoosukKwon commented Sep 27, 2023

Uh oh!

hahmad2008 commented Nov 26, 2024 •

edited

Loading

Uh oh!

Uh oh!

RoPE scaling support? #464

RoPE scaling support? #464

Comments

nivibilla commented Jul 14, 2023

WoosukKwon commented Jul 14, 2023

Uh oh!

nivibilla commented Jul 14, 2023

Uh oh!

lucasjinreal commented Jul 17, 2023

Uh oh!

nivibilla commented Jul 17, 2023

Uh oh!

lucasjinreal commented Jul 17, 2023

Uh oh!

nivibilla commented Jul 17, 2023

Uh oh!

kir152 commented Sep 21, 2023

Uh oh!

viktor-ferenczi commented Sep 26, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viktor-ferenczi commented Sep 26, 2023

Uh oh!

WoosukKwon commented Sep 27, 2023

Uh oh!

hahmad2008 commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

viktor-ferenczi commented Sep 26, 2023 •

edited

Loading

hahmad2008 commented Nov 26, 2024 •

edited

Loading