-
Notifications
You must be signed in to change notification settings - Fork 11.8k
Shape Error When Running Inference after Converting OpenLlama 3B to GGML #1709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Actually I realize was loading the wrong model which was using the old format. Downloaded one for "ggjt v3" and the error went away. Though I'm getting a different error now, which is this one: #1732 |
Same for me. It is also broken in the original commit (ffb06a3), tested with the 600bt version. The error can be fixed by applying the hack in #1588. Quantized models will also work fine then. I don't see either the original hack or a suitable replacement being merged with the original PR, @SlyEcho. Something else broke since it was added though, as quantized models will output garbage in the current version (92f20d9). The converted fp16 model still works fine (with the hack). |
@BrickBee, which quantization format is broken for you? I can confirm that 3B Q4_0 and Q5_1 is working with the current master I have the files up on https://huggingface.co/SlyEcho/open_llama_3b_ggml and if you want to create them yourself, the Makefile and diff file can create all the models and checksums from scratch. |
I can confirm that the quantized files that you've linked work fine with the release version that you have linked. My quantized versions that I've created at the time of the PR also still work correctly with the current version. |
OK, I can trace it back to PR #1807, which for some reason starts to quantize a single tensor using Q6_K, regardless of the user's choice of format and making those models broken when K quants are not compiled (they are optional) or not supported. This was actually reverted temporarily in #1711, but added back in. What was the thinking behind this change, @ikawrakow? |
Clearly, there wasn't enough thinking here ;-) More seriously, the decision to bring it back was based on a discussion with @ggerganov that we should use the more accurate On that note, I wonder how the OpenLLaMA 3B model is being used. I downloaded the
If I fix this so I'm able to load the model and run a perplexity calculation, I get wild values in excess of 2000. What am I missing? Is it because the tokenization is different and, if so, how do you use the 3B model? I would like to use it to work on the k-quants adaptation to not 256-divisible model sizes, so any help is appreciated. |
Conversion and fp16 inference works after applying this diff. |
convert.py is still broken and we didn't want to commit the crude hacks. But since the model has a free license the files are up for download. Check my HF repo for the converted files and also the full Makefile to run it yourself. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Expected Behavior
Model loads successfully and inference can be run.
Current Behavior
Model fails to load with shape error.
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
$ lscpu
$ uname -a
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
Failure Logs
The text was updated successfully, but these errors were encountered: