Skip to content

LLama3 streaming repeats the previous request's first token. #287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mikutsky opened this issue Apr 19, 2024 · 8 comments · Fixed by #288
Closed

LLama3 streaming repeats the previous request's first token. #287

mikutsky opened this issue Apr 19, 2024 · 8 comments · Fixed by #288

Comments

@mikutsky
Copy link

mikutsky commented Apr 19, 2024

Hi! I'm running into a problem of repeating the first token in subsequent requests using a stream. The prompt structure follows the Meta LLama3 documentation. Could you explain why is this going on?

Simple chat example output looks in this way:

The model name is meta/meta-llama-3-70b-instruct

You: Hi!
Assistant: Hi! How can I help you today?

You: Recommend me a Hemingway novel, please.
Assistant: Hi
I'd recommend "The Old Man and the Sea". It's a classic, concise, and powerful novel that showcases Hemingway's unique writing style.

You: I read it, please recommend something else.
Assistant: Hi
I
How about "A Farewell to Arms"? It's a romantic and tragic novel set during WWI, and it's considered one of Hemingway's best works.

You: It's great! Thank you! Bye!
Assistant: Hi
I
How
You're welcome! I'm glad you enjoyed the recommendation. Have a great day and happy reading! Bye!

Example code:

import os
from replicate.client import Client

replicate_api_key = os.getenv("REPLICATE_API_TOKEN", 'EMPTY')
replicate_model = os.getenv('REPLICATE_MODEL', 'meta/meta-llama-3-70b-instruct')
replicate_client = Client(api_token=replicate_api_key)

SYSTEM_PROMPT = 'You are a helpful assistant. Answer briefly!'
MESSAGES = []


def gen_llama3_prompt(sys_prompt=None, messages=None):
    sys_prompt = '' if sys_prompt is None else sys_prompt
    messages = [] if messages is None else messages
    _result = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{sys_prompt}<|eot_id|>"
    for m in messages:
        if m['role'] == 'user':
            _result += f'<|start_header_id|>user<|end_header_id|>\n\n{m["content"]}<|eot_id|>'
        elif m['role'] == 'assistant':
            _result += f'<|start_header_id|>assistant<|end_header_id|>\n\n{m["content"]}<|eot_id|>'
    _result += '<|start_header_id|>assistant<|end_header_id|>\n\n'
    return _result


def print_answer(query=''):
    message = {'role': 'user', 'content': query}
    answer = ''
    MESSAGES.append(message)
    for event in replicate_client.stream(
            "meta/meta-llama-3-70b-instruct",
            input={
                "top_p": 1e-5,
                "prompt": gen_llama3_prompt(SYSTEM_PROMPT, MESSAGES),
                "max_tokens": 512,
                "min_tokens": 0,
                "temperature": 1e-6
            }):
        token = str(event)
        answer += token
        print(token, end='')
    message = {'role': 'assistant', 'content': answer}
    MESSAGES.append(message)


if __name__ == '__main__':
    print(f'Model name is {replicate_model}')
    while True:
        q = input('\nYou: ')
        print('Assistant: ', end='')
        print_answer(q)
        if 'bye' in q.lower():
            break

Thanks for your help!

@mattt
Copy link
Contributor

mattt commented Apr 19, 2024

Hi @mikutsky. Thanks for reporting this. Can you share any predictions for these? (Go to your replicate.com Dashboard, look under Predictions). Seeing that would help us tell if the problem is in the model or the client library.

@Gusakovskyi
Copy link

Gusakovskyi commented Apr 19, 2024

Hi, have the same issue

@mattt
Copy link
Contributor

mattt commented Apr 19, 2024

@Gusakovskyi @mikutsky We've confirmed that there's an issue with stop sequences for meta/meta-llama-3-70b-instruct, and we're working on a fix.

@mikutsky
Copy link
Author

Hi @mikutsky. Thanks for reporting this. Can you share any predictions for these? (Go to your replicate.com Dashboard, look under Predictions). Seeing that would help us tell if the problem is in the model or the client library.

It looks like the client library problem. I provide you second query info. Because the next queries collect mistakes in the prompt.

Everything looks correct on the dashboard:
iScreen Shoter - Google Chrome - 240419235931

Here is the prompt for the second query, and the prompt is still correct:

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant. Answer briefly!<|eot_id|><|start_header_id|>user<|end_header_id|>

Hi!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hi! How can I help you today?<|eot_id|><|start_header_id|>user<|end_header_id|>

I read it, please recommend something else.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

However console output contains the extra tag 'Hi':

You: >? Hi!
Assistant: Hi! How can I help you today?

You: >? I read it, please recommend something else.
Assistant: Hi
I'd be happy to! However, I need a bit more information. What type of content are you in the mood for? A book, article, podcast, or something else?

@mattt
Copy link
Contributor

mattt commented Apr 19, 2024

@mikutsky We just pushed a new build of the model, which should address the stop sequence problem. Please give your client code another try and let me know if that's working for you now.

If not, could you please try calling replicate.stream in isolation? I'd like to rule out the use of input and accessing mutable state in a loop, even though that should be running synchronously and not be a problem.

@mattt
Copy link
Contributor

mattt commented Apr 19, 2024

Actually, I'm able to reproduce this in isolation, so it does appear to be an issue with the client. Working on a fix now.

mattt added a commit that referenced this issue Apr 19, 2024
Fixes #287 

Given the following code:

```python
import replicate


def go():
    for event in replicate.stream(
        "meta/meta-llama-3-8b-instruct",
        input={
            "top_p": 0.9,
            "prompt": "Hi! Help me please:)",
            "max_tokens": 512,
            "min_tokens": 0,
            "temperature": 0.01,
            "stop_sequences": "<|end_of_text|>",
            "prompt_template": "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n",
            "presence_penalty": 0,
            "frequency_penalty": 1,
        },
    ):
        print(str(event), end="")
    print()


go()
print("---------------------------------")
go()
print("---------------------------------")
go()
print("---------------------------------")

```

The latest release repeats the same token once for each previous
invocation.

<details>

```
Hi there! I'd be happy to help you with whatever you need. What's on your mind?assistant

I'm glad you asked! I'm here to help you with any questions or problems you might have. Whether it's something big or something small, I'm here to help.

So, what's on your mind? Do you have a specific question or problem you'd like to talk about?assistant

I'm all ears! Take your time, and feel free to share as much or as little as you'd like.

Remember, everything we discuss is confidential and just between us. So,
---------------------------------
Hi
Hi there! I'd be happy to help you with whatever you need. What's on your mind?assistant

I'm glad you asked! I'm here to help you with any questions or problems you might have. Whether it's something big or something small, I'm here to help.

So, what's on your mind? Do you have a specific question or problem you'd like to talk about?assistant

I'm all ears! Take your time, and feel free to share as much or as little as you'd like.

Remember, everything we discuss is confidential and just between us. So,
---------------------------------
Hi
Hi
Hi there! I'd be happy to help you with whatever you need. What's on your mind?assistant

I'm glad you asked! I'm here to help you with any questions or problems you might have. Whether it's something big or something small, I'm here to help.

So, what's on your mind? Do you have a specific question or problem you'd like to talk about?assistant

I'm all ears! Take your time, and feel free to share as much or as little as you'd like.

Remember, everything we discuss is confidential and just between us. So,
---------------------------------
```

</details>

This is caused by incorrect initialization of the stream `Decoder`
object. After applying this change, the code produces the correct
behavior:

<details>

```
Hi there! I'd be happy to help you with whatever you need. What's on your mind?assistant

I'm glad you asked! I'm here to help you with any questions or problems you might have. Whether it's something big or something small, I'm here to help.

So, what's on your mind? Do you have a specific question or problem you'd like to talk about?assistant

I'm all ears! Take your time, and feel free to share as much or as little as you'd like.

Remember, everything we discuss is confidential and just between us. So,
---------------------------------
Hi there! I'd be happy to help you with whatever you need. What's on your mind?assistant

I'm glad you asked! I'm here to help you with any questions or problems you might have. Whether it's something big or something small, I'm here to help.

So, what's on your mind? Do you have a specific question or problem you'd like to talk about?assistant

I'm all ears! Take your time, and feel free to share as much or as little as you'd like.

Remember, everything we discuss is confidential and just between us. So,
---------------------------------
Hi there! I'd be happy to help you with whatever you need. What's on your mind?assistant

I'm glad you asked! I'm here to help you with any questions or problems you might have. Whether it's something big or something small, I'm here to help.

So, what's on your mind? Do you have a specific question or problem you'd like to talk about?assistant

I'm all ears! Take your time, and feel free to share as much or as little as you'd like.

Remember, everything we discuss is confidential and just between us. So,
---------------------------------

```

Signed-off-by: Mattt Zmuda <[email protected]>
@mattt
Copy link
Contributor

mattt commented Apr 19, 2024

@mikutsky @Gusakovskyi Thanks again for reporting. This should be fixed by 0.25.2.

Please let me know if you continue to see this behavior.

@mikutsky
Copy link
Author

@mikutsky @Gusakovskyi Thanks again for reporting. This should be fixed by 0.25.2.

Please let me know if you continue to see this behavior.

Thanks a lot! It works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants