-
-
Notifications
You must be signed in to change notification settings - Fork 7.3k
Add check for transformers
version to fix some CodeLlama-based models' behaviors
#998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hi @HermitSun, is setting |
Thank you for pointing this out! It seems that this problem is actually caused by fp16 precision overflow in some cases and I missed it😭. I will double check it in my case. This version mismatch check is useful for me, because in my work I will try many different models with different transformers versions. Since And what do you think about adding this version check when using CodeLlama-based models? I mean, if you do not think my current change is so necessary, I think I can change it to simply printing a warning when a version mismatch is detected, and then I will try to working on this precision problem in another new pr. |
Related: I'm trying to run
|
I noticed that this code uses a |
Hi @HermitSun
No worries. @imoneoi and I made a PR #1004 to fix this issue.
Actually, I don't understand. Isn't |
imk,wizardcoder-34b was trained using transformers==4.31.0, which doesn't support rope theta, change to base 10000 will be ok. |
Hi @WoosukKwon, after applying the changes in PR #1004, I find the model works well (At least it looks well...Maybe I need more tests.). It seems that we do not have to change the And I originally opened this pr to check the |
The HF repos of the |
I additionally did some simple experiments with WizardLM/WizardCoder-Python-34B-V1.0:
Because the model will give gibberish outputs when
And I noticed that WizardLM/WizardCoder-Python-13B-V1.0 and WizardLM/WizardCoder-Python-34B-V1.0 both require vllm==0.1.4. At that time vLLM does not support read |
@HermitSun What is interesting is that adding rope theta slightly (ex. 16384) might improve the score. |
Amazing! I will update my results below, and it seems to prove that
And I wonder if this conclusion (i.e. slightly increase of |
@HermitSun Thanks a lot for the evaluation. Could you please share the script you used for it? And, if possible, could you measure the accuracy on the git checkout rope-fp32
pip install -e . |
The same issue is discussed at |
I used the evaluation scripts in WizardLM's official repo and followed their environment setup steps: https://github.com/nlpxucan/WizardLM/tree/main/WizardCoder. But I think my environment has two minor differences with the authors', and that maybe why I reported a result lower than the authors claimed:
I will use latest code (instead of my fork) and apply fp32 precision patch (and compare the two patches in PR #1004 & branch rope-fp32) then update my test results again. And I think we'd better further discuss the precision problem in PR #1004? We can solve the |
@WoosukKwon I used the latest code and applied fp32 precision patches, the updated results are here:
It seems that PR #1004 performs better when using default |
Hi @HermitSun, thanks for the benchmark results. It seems #1004 is enough for the precision problem. Then, I believe it should be the users who are responsible for setting the right
While it makes sense to me, I'm wondering what the exact logic would be. 99% of the models in the HF model hub were trained with transformers < v4.33.1. How can we tell when to warn? If I understand correctly, your PR will warn every llama model trained before 4.33.1, including LLaMA V1 and V2. How can we minimize such false positives? |
Hi @WoosukKwon, I think you're right, maybe we should not introduce extra workload for an inference framework. As for the version mismatch detection problem, I think we can add some heuristic rules to check the If you feel this pr (including replacing the |
It adds performance benchmark to jenkins to early catch regression for torch.compile. One benchmark run produce to results (2 separate testcases). One compares throughput against given threshold, second warmup time
Supports for CodeLlama have been added in
transformers
4.33.0 (huggingface/transformers#25740). This pr introduces therope_theta
param for RoPE scaling.However, some CodeLLaMA-based models weights are released before this pr merged. That is to say, those models did not take the
rope_theta
into considerations. And if we directly use therope_theta
from their configs like this, those models will give gebbrish outputs.Fortunately, there is a
transformers_version
attribute in hf models'config.json
. So we can add a check to solve this problem: if we find the model use the legacy version oftransformers
, we simply use the defaultrope_theta
value.And after applying this patch, I think at least the following models will work normally (tested on A100 with CUDA 11.8):