Code llama output gibberish code. #883

esmeetu · 2023-08-26T10:38:49Z

Test on this
https://huggingface.co/docs/transformers/main/model_doc/code_llama

Env:

model: Code Llama 7b Instruct
vLLM main branch 0.1.4 on 4b6f069.
transformers: 4.33.0.dev0 (build from main branch on 960807f)
torch_dtype=torch.float16 (my gpu doesn't support bfloat16)

Generated text from transformers:

def remove_non_ascii(s: str) -> str:
    """ Remove non-ASCII characters from a string.

    Args:
        s (str): The string to remove non-ASCII characters from.

    Returns:
        str: The string with non-ASCII characters removed.
    """
    result = ""
    for c in s:
        if ord(c) < 128:
            result += c
    return result


def remove_non_ascii_and_spaces(s: str) -> str:
    """ Remove non-ASCII and space characters from a string.


    return result

Generated text from vLLM:

<- Prompt ->:
def remove_non_ascii(s: str) -> str:
    """ <FILL_ME>
    return result

<- Answer ->:

   For comments

    """
        Remove non ascii inplace of special characters from a given string.""""
        non ascii => blank space, punctuation""""""codepointpython| nap string""" in Python"""test_safe
    u""
    
    among'".
   328
    <PDFutable       # tring control chars's to be "-"a"

    value.
// Dasailss
    ASCII characters so:""""
   
    X" 
    :5
	 Non-""
    <

Engine Log:

Received request cmpl-31a97cd2e50f4c18a9ac9e5fc4df4920: prompt: 'def remove_non_ascii(s: str) -> str:\n    """ <FILL_ME>\n    return result\n', sampling params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, temperature=1.0, top_p=1.0, top_k=-1, use_beam_search=False, stop=[], ignore_eos=False, max_tokens=128, logprobs=None), prompt token ids: [1, 822, 3349, 29918, 5464, 29918, 294, 18869, 29898, 29879, 29901, 851, 29897, 1599, 851, 29901, 13, 1678, 9995, 529, 3738, 2208, 29918, 2303, 29958, 13, 1678, 736, 1121, 13].

The text was updated successfully, but these errors were encountered:

the-crypt-keeper · 2023-08-27T14:03:06Z

I am experiencing the same issue on both codellama/CodeLlama-7b-Instruct-hf and TheBloke/CodeLlama-7B-Instruct-fp16, the output is nonsense.

running on A10G at commit 4b6f069b6fbb4f2ef7d4c6a62140229be61c5dd3 with transformers 4.32

ltz0120 · 2023-08-27T16:57:23Z

I got the same problem. The results output by any version of CodeLamma are nonsense.

tridungduong16 · 2023-08-28T10:15:51Z

I think vllm does not work with llama code. Try with Wizard Coder.

 in ensure_divisibility
    assert numerator % denominator == 0, "{} is not divisible by {}".format(
AssertionError: 32001 is not divisible by 2

TomExMachina · 2023-08-31T17:24:19Z

I am experiencing the same gibberish for both CodeLLama and WizardCoder using vllm via skypilot. Has anyone tried turning down the temperature? I was hoping to turn it down using the opanai api feature but had trouble getting that working on skypilot.

LopezGG · 2023-09-05T22:47:32Z

Huggingface version of codellama has "torch_dtype": "bfloat16" which is not supported by H100. I had change this to float16 to get it working. In this setting even with temperature 0, the values were gibberish. I managed to get my hands on a A100 which supports bfloat. With bfloat the responses are similar to the ones we see in meta inference code

the-crypt-keeper · 2023-09-06T03:08:47Z

Thanks to the helpful comment from @matheper in #900 about safetensors being the culprit I got the instruct version working tonight 🎉

INFO 09-06 02:59:49 llm_engine.py:70] Initializing an LLM engine with config: model='codellama/CodeLlama-7b-Instruct-hf', tokenizer='codellama/CodeLlama-7b-Instruct-hf', tokenizer_mode=auto, trust_remote_code=True, dtype=torch.bfloat16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=1, seed=0)

Hints:

Download codellama/CodeLlama-7b-Instruct-hf with ignore_patterns=["*.safetensors"] you want only the .bin
pip install transformers==4.33
Use a proper instruct-compatible prompt format, the simplest one that works is:

<s>[INST] Write a python function that adds two numbers [/INST]

dtype has to be bfloat16. I've tested on A10G and A100

c3-avidmych · 2023-09-06T18:33:32Z

Code Llama 13B Instruct - works for me with latest vLLM source code like this.

Dockerfile

FROM nvidia/cuda:11.8.0-devel-ubuntu22.04

ENV MODEL_NAME=codellama/CodeLlama-13b-Instruct-hf \
    CUDA_HOME=/usr/local/cuda \
    HOST=0.0.0.0 \
    PORT=8000

WORKDIR /app

# Install system dependencies
RUN apt-get update && \
    apt-get install -y git python3 python3-pip nvidia-container-toolkit && \
    rm -rf /var/lib/apt/lists/*

RUN pip3 install torch

COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt

EXPOSE 8000

COPY entrypoint.sh ./
ENTRYPOINT [ "./entrypoint.sh" ]

requirements.txt

git+https://github.com/huggingface/transformers.git@main
accelerate
git+https://[email protected]/vllm-project/vllm.git@main

entrypoint.sh

/usr/bin/python3 -m vllm.entrypoints.openai.api_server \
  --model $MODEL_NAME \
  --host $HOST \
  --port $PORT

Request:

curl "http://$EXTERNAL_SERVER_IP:8000/v1/completions" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "codellama/CodeLlama-13b-Instruct-hf",
        "prompt": "<s>[INST]Write a python function that returns the first 10 number of the fibonacci sequence[/INST]",
        "max_tokens": 1000,
        "temperature": 0
    }' | jq -r '.choices[0].text'

============= Output ============
Here is a function that returns the first 10 numbers of the Fibonacci sequence:

def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

print(fibonacci(10))

This function uses a recursive approach to calculate the Fibonacci sequence. It starts by checking if the input n is less than or equal to 1, in which case the result is simply n. If n is greater than 1, the function calls itself with n-1 and n-2 as arguments, and adds the results together to get the next number in the sequence.

The print statement at the end of the function is just for testing purposes, and can be removed if you want to use the function in a larger program.

esmeetu · 2023-09-07T00:29:20Z

@LopezGG @the-crypt-keeper @c3-avidmych @TomExMachina IMK vLLM doesn't support codellama with float16, where RoPE has problem with precision.

esmeetu · 2023-09-08T09:06:34Z

Solve this by applying #870. Need build by yourself. Good Luck!😇

1. use VLLM_ENABLE_RUNTIME_DEQUANT=1 to run with runtime dequantize 2. use VLLM_DMOE_DYNAMIC_SCALE=1 to run with dynamic dequantize + dynamic MOE 3.acc looks good as below ``` VLLM_DMOE_DYNAMIC_SCALE=1 python scripts/run_lm_eval.py -l 64 --batch_size 8 {"gsm8k": {"alias": "gsm8k", "exact_match,strict-match": 0.96875, "exact_match_stderr,strict-match": 0.021921011700381302, "exact_match,flexible-extract": 0.96875, "exact_match_stderr,flexible-extract": 0.021921011700381302}}{"e2e time(secs)": 938.2986768169999} ``` --------- Signed-off-by: Chendi Xue <[email protected]>

the-crypt-keeper mentioned this issue Aug 27, 2023

WIP: Upgrade all dependencies to support CodeLlama the-crypt-keeper/can-ai-code#85

Closed

esmeetu closed this as completed Sep 8, 2023

This was referenced Sep 9, 2023

Add check for transformers version to fix some CodeLlama-based models' behaviors #998

Closed

Use FP32 in RoPE initialization #1004

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code llama output gibberish code. #883

Code llama output gibberish code. #883

esmeetu commented Aug 26, 2023 •

edited

Loading

the-crypt-keeper commented Aug 27, 2023

ltz0120 commented Aug 27, 2023

tridungduong16 commented Aug 28, 2023

TomExMachina commented Aug 31, 2023 •

edited

Loading

LopezGG commented Sep 5, 2023 •

edited

Loading

the-crypt-keeper commented Sep 6, 2023

c3-avidmych commented Sep 6, 2023

esmeetu commented Sep 7, 2023

esmeetu commented Sep 8, 2023 •

edited

Loading

Code llama output gibberish code. #883

Code llama output gibberish code. #883

Comments

esmeetu commented Aug 26, 2023 • edited Loading

the-crypt-keeper commented Aug 27, 2023

ltz0120 commented Aug 27, 2023

tridungduong16 commented Aug 28, 2023

TomExMachina commented Aug 31, 2023 • edited Loading

LopezGG commented Sep 5, 2023 • edited Loading

the-crypt-keeper commented Sep 6, 2023

c3-avidmych commented Sep 6, 2023

Dockerfile

requirements.txt

entrypoint.sh

Request:

esmeetu commented Sep 7, 2023

esmeetu commented Sep 8, 2023 • edited Loading

esmeetu commented Aug 26, 2023 •

edited

Loading

TomExMachina commented Aug 31, 2023 •

edited

Loading

LopezGG commented Sep 5, 2023 •

edited

Loading

esmeetu commented Sep 8, 2023 •

edited

Loading