Not seeing any performance improvement from 0.2.0 #1269

nutmilk10 · 2023-10-05T17:34:38Z

It was mentioned to have 60% performance improvement in latest release, I'm unable to utilize quantization since my GPU version is not compatible but compared to 0.1.8, the runtime performance is the same at 2.1 seconds. Am I doing something wrong in my config?

Here are my specs:

OS: Ubuntu 20.04
CUDA Version: 11.2
CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz
CPU RAM: 200
GPU: Tesla V100-SXM2 32GB

prompts = """
    Summarize the message below, delimited by triple backticks, using short bullet points.
    ```{message}```
    BULLET POINT SUMMARY:
"""
from vllm import LLM, SamplingParams

llm = LLM(model='meta-llama/Llama-2-13b-chat-hf, trust_remote_code=True, dtype="float16", tensor_parallel_size=1, gpu_memory_utilization=.95, disable_log_stats=True, tokenizer='hf-internal-testing/llama-tokenizer')

sampling_params = SamplingParams(n = 1, best_of = 1, presence_penalty = 0, frequency_penalty = 0, temperature=0, top_p=1.0, top_k=-1, use_beam_search=False, stop="<|endoftext|>", max_tokens=1024)

outputs = llm.generate(prompts, sampling_params)

The text was updated successfully, but these errors were encountered:

WoosukKwon · 2023-10-11T07:59:00Z

Hi @nutmilk10, in v0.2.0 we mostly optimized for throughput. The core optimization was on de-tokenizer #984 and sampler #1048. These optimizations reduce a lot of overheads when many requests are batched. The single-request latency might not be reduced a lot.

Now we are focusing on reducing latency. Please stay tuned for the upcoming optimizations!

WoosukKwon closed this as completed Oct 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Not seeing any performance improvement from 0.2.0 #1269

Not seeing any performance improvement from 0.2.0 #1269

nutmilk10 commented Oct 5, 2023

WoosukKwon commented Oct 11, 2023

Uh oh!

Uh oh!

Not seeing any performance improvement from 0.2.0 #1269

Not seeing any performance improvement from 0.2.0 #1269

Comments

nutmilk10 commented Oct 5, 2023

WoosukKwon commented Oct 11, 2023

Uh oh!