-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Running LLaMA v2 with chat format. #507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
For those interested, I've created this repo to handle LLaMA v2 chat completion. However, I still need to solve this issue. |
@viniciusarruda thank you for reporting this issue and setting up that repo, I'm working on making the chat completion formatting be configurable and will add that as an option. |
Nice! I'm still trying to solve the issue presented here. It's giving a different token when comparing the original Meta tokenizer and GGLM. I think it is not related to this repo, since I'm calling the low-level API and the result is the same. |
With this formatting, do you recommend the chat version or the regular version of Llama 2? |
Hey, so i get the same error, is this maybe related? from langchain import LlamaCpp
from langchain.document_loaders import DirectoryLoader, TextLoader
local_model_path = '../llama/llama.cpp/models/13B/ggml-model-q4_0.bin'
loader = TextLoader('example.txt')
from langchain.embeddings import LlamaCppEmbeddings
llama_llm = LlamaCpp(model_path=local_model_path, n_ctx=2048, max_tokens=50)
from langchain.indexes import VectorstoreIndexCreator
llama_emb = LlamaCppEmbeddings(model_path=local_model_path)
index = VectorstoreIndexCreator(embedding=llama_emb).from_loaders([loader])
q = "What is the context about?"
print(index.query(q, llm=llama_llm)) When |
Ah thank you so much! I appreciate your work a ton, it's made my life so much easier <3 |
Since this is related to tokenization, I'll link an issue I found with llama.cpp that could be related: ggml-org/llama.cpp#2310 |
Sorry, I was checking and it seems not to be possible. To include the LLaMA V2 chat completion functionality in this repo, the original Also, since this is only valid for |
Any news on this? I've managed to get something working with a sliding context window, but still getting this error once it goes out of context range. |
@viniciusarruda, @ffernn-dev Could you give some code reproduce for fix it? I have set truncate, max_tokens, but it is not working |
I also encountered this problem. |
Hey sorry I've got exam season at the moment so I can't throw together a minimum reproduction but the code at the top of this issue should do it I think? |
@viniciusarruda sorry for the delay, will start working on this, needed to first merge in the gguf changes to handle getting an accurate model description, should be able to detect the model format now and implement a fix. |
Thanks for the update! I appreciate the effort you're putting into ensuring everything is set up correctly with the gguf changes. Can you confirm if this llama-2 chat completion format is still ongoing? I'm looking forward this, Thanks! |
This should be solved now with the |
Prerequisites
Expected Behavior
Generate all tokens without any error.
Current Behavior
Note: omitted my file path.
Environment and Context
Running on Windows (PowerShell).
Python version: 3.10.9
Failure Information
When the error happens, the state of the variables in line L470 is shown in the following image:
which results in a zero-range index for the assignment.
Steps to Reproduce
Download the llama-2-7b-chat.ggmlv3.q2_K.bin model here
Run the following code:
Note: The code is basically this
I'm trying to use the correct chat format influenced by some discussions around this: [1, 2, 3]
The text was updated successfully, but these errors were encountered: