Skip to content

Code llama output gibberish code. #883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
esmeetu opened this issue Aug 26, 2023 · 9 comments · Fixed by #1004
Closed

Code llama output gibberish code. #883

esmeetu opened this issue Aug 26, 2023 · 9 comments · Fixed by #1004

Comments

@esmeetu
Copy link
Member

esmeetu commented Aug 26, 2023

Test on this
https://huggingface.co/docs/transformers/main/model_doc/code_llama

Env:

  1. model: Code Llama 7b Instruct
  2. vLLM main branch 0.1.4 on 4b6f069.
  3. transformers: 4.33.0.dev0 (build from main branch on 960807f)
  4. torch_dtype=torch.float16 (my gpu doesn't support bfloat16)

Generated text from transformers:

def remove_non_ascii(s: str) -> str:
    """ Remove non-ASCII characters from a string.

    Args:
        s (str): The string to remove non-ASCII characters from.

    Returns:
        str: The string with non-ASCII characters removed.
    """
    result = ""
    for c in s:
        if ord(c) < 128:
            result += c
    return result


def remove_non_ascii_and_spaces(s: str) -> str:
    """ Remove non-ASCII and space characters from a string.


    return result

Generated text from vLLM:

<- Prompt ->:
def remove_non_ascii(s: str) -> str:
    """ <FILL_ME>
    return result

<- Answer ->:

   For comments

    """
        Remove non ascii inplace of special characters from a given string.""""
        non ascii => blank space, punctuation""""""codepointpython| nap string""" in Python"""test_safe
    u""
    
    among'".
   328
    <PDFutable       # tring control chars's to be "-"a"

    value.
// Dasailss
    ASCII characters so:""""
   
    X" 
    :5
	 Non-""
    <

Engine Log:

Received request cmpl-31a97cd2e50f4c18a9ac9e5fc4df4920: prompt: 'def remove_non_ascii(s: str) -> str:\n    """ <FILL_ME>\n    return result\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=1.0, top_p=1.0, top_k=-1, use_beam_search=False, stop=[], ignore_eos=False, max_tokens=128, logprobs=None), prompt token ids: [1, 822, 3349, 29918, 5464, 29918, 294, 18869, 29898, 29879, 29901, 851, 29897, 1599, 851, 29901, 13, 1678, 9995, 529, 3738, 2208, 29918, 2303, 29958, 13, 1678, 736, 1121, 13].
@the-crypt-keeper
Copy link

I am experiencing the same issue on both codellama/CodeLlama-7b-Instruct-hf and TheBloke/CodeLlama-7B-Instruct-fp16, the output is nonsense.

running on A10G at commit 4b6f069b6fbb4f2ef7d4c6a62140229be61c5dd3 with transformers 4.32

@ltz0120
Copy link

ltz0120 commented Aug 27, 2023

I got the same problem. The results output by any version of CodeLamma are nonsense.

@tridungduong16
Copy link

I think vllm does not work with llama code. Try with Wizard Coder.

 in ensure_divisibility
    assert numerator % denominator == 0, "{} is not divisible by {}".format(
AssertionError: 32001 is not divisible by 2

@TomExMachina
Copy link

TomExMachina commented Aug 31, 2023

I am experiencing the same gibberish for both CodeLLama and WizardCoder using vllm via skypilot. Has anyone tried turning down the temperature? I was hoping to turn it down using the opanai api feature but had trouble getting that working on skypilot.

@LopezGG
Copy link

LopezGG commented Sep 5, 2023

Huggingface version of codellama has "torch_dtype": "bfloat16" which is not supported by H100. I had change this to float16 to get it working. In this setting even with temperature 0, the values were gibberish. I managed to get my hands on a A100 which supports bfloat. With bfloat the responses are similar to the ones we see in meta inference code

@the-crypt-keeper
Copy link

Thanks to the helpful comment from @matheper in #900 about safetensors being the culprit I got the instruct version working tonight 🎉

INFO 09-06 02:59:49 llm_engine.py:70] Initializing an LLM engine with config: model='codellama/CodeLlama-7b-Instruct-hf', tokenizer='codellama/CodeLlama-7b-Instruct-hf', tokenizer_mode=auto, trust_remote_code=True, dtype=torch.bfloat16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)

Hints:

  1. Download codellama/CodeLlama-7b-Instruct-hf with ignore_patterns=["*.safetensors"] you want only the .bin
  2. pip install transformers==4.33
  3. Use a proper instruct-compatible prompt format, the simplest one that works is:
<s>[INST] Write a python function that adds two numbers [/INST]
  1. dtype has to be bfloat16. I've tested on A10G and A100

@c3-avidmych
Copy link

Code Llama 13B Instruct - works for me with latest vLLM source code like this.

Dockerfile

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

ENV MODEL_NAME=codellama/CodeLlama-13b-Instruct-hf \
    CUDA_HOME=/usr/local/cuda \
    HOST=0.0.0.0 \
    PORT=8000

WORKDIR /app

# Install system dependencies
RUN apt-get update && \
    apt-get install -y git python3 python3-pip nvidia-container-toolkit && \
    rm -rf /var/lib/apt/lists/*

RUN pip3 install torch

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

EXPOSE 8000

COPY entrypoint.sh ./
ENTRYPOINT [ "./entrypoint.sh" ]

requirements.txt

git+https://github.com/huggingface/transformers.git@main
accelerate
git+https://[email protected]/vllm-project/vllm.git@main

entrypoint.sh

/usr/bin/python3 -m vllm.entrypoints.openai.api_server \
  --model $MODEL_NAME \
  --host $HOST \
  --port $PORT

Request:

curl "http://$EXTERNAL_SERVER_IP:8000/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "codellama/CodeLlama-13b-Instruct-hf",
        "prompt": "<s>[INST]Write a python function that returns the first 10 number of the fibonacci sequence[/INST]",
        "max_tokens": 1000,
        "temperature": 0
    }' | jq -r '.choices[0].text'

============= Output ============
Here is a function that returns the first 10 numbers of the Fibonacci sequence:

def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(10))

This function uses a recursive approach to calculate the Fibonacci sequence. It starts by checking if the input n is less than or equal to 1, in which case the result is simply n. If n is greater than 1, the function calls itself with n-1 and n-2 as arguments, and adds the results together to get the next number in the sequence.

The print statement at the end of the function is just for testing purposes, and can be removed if you want to use the function in a larger program.

@esmeetu
Copy link
Member Author

esmeetu commented Sep 7, 2023

@LopezGG @the-crypt-keeper @c3-avidmych @TomExMachina IMK vLLM doesn't support codellama with float16, where RoPE has problem with precision.

@esmeetu
Copy link
Member Author

esmeetu commented Sep 8, 2023

Solve this by applying #870. Need build by yourself. Good Luck!😇

@esmeetu esmeetu closed this as completed Sep 8, 2023
jikunshang pushed a commit to jikunshang/vllm that referenced this issue Mar 14, 2025
1. use VLLM_ENABLE_RUNTIME_DEQUANT=1 to run with runtime dequantize
2. use VLLM_DMOE_DYNAMIC_SCALE=1 to run with dynamic dequantize +
dynamic MOE
3.acc looks good as below 
```
VLLM_DMOE_DYNAMIC_SCALE=1 python scripts/run_lm_eval.py -l 64 --batch_size 8
{"gsm8k": {"alias": "gsm8k", "exact_match,strict-match": 0.96875, "exact_match_stderr,strict-match": 0.021921011700381302, "exact_match,flexible-extract": 0.96875, "exact_match_stderr,flexible-extract": 0.021921011700381302}}{"e2e time(secs)": 938.2986768169999}
```

---------

Signed-off-by: Chendi Xue <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants