Discussion: batching benchmark and improvement #164

vansangpfiev · 2024-07-29T03:22:31Z

Motivation

AFAIK, we have already supported batching but do not have a benchmark for it yet. We should review the implementation to see if can do any improvements.

Discussion

Resources

nguyenhoangthuan99 · 2024-07-30T03:58:46Z

Llama.cpp implementation:

Server will gather requests (1 prompt/ 1 request) and build batch. Llama.cpp engine will process a whole batch at 1 step.
llama.cpp engine doesn't support parallel prompts process. refer to this: llama : add batched inference endpoint to server ggml-org/llama.cpp#3478 (comment)

Cortex inplementation:

Each request(1 prompt/ 1 request) sent to server will be prepared and add to task queue. There is a background process gathers prompts in the task queue, build batch and process batch then push result to output queue.
=> Current implementation of cortex llama.cpp can support batching but need to adjust some params to sync with latest llama.cpp implementation and add doc to Readme.md. Also a benchmark script test run batch to verify implementation is needed.

nguyenhoangthuan99 · 2024-07-31T07:03:49Z

#168

Result when run script in 3090 Linux:
{'message': 'Model already loaded'}

Finished in 27.825968503952026 s
Total token: 6108
Throughput when run parallel: 219.50718441776795 tokens/s
############################
Finished in 38.07835125923157 s
Total token: 4966
Throughput when run in sequence: 130.4153104264477 tokens/s
###########################
--- 70.19260907173157 seconds ---

vansangpfiev assigned nguyenhoangthuan99 Jul 29, 2024

nguyenhoangthuan99 linked a pull request Jul 31, 2024 that will close this issue

add script for benchmark #168

Merged

nguyenhoangthuan99 closed this as completed in #168 Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discussion: batching benchmark and improvement #164

Discussion: batching benchmark and improvement #164

vansangpfiev commented Jul 29, 2024

nguyenhoangthuan99 commented Jul 30, 2024 •

edited

Loading

nguyenhoangthuan99 commented Jul 31, 2024 •

edited

Loading

Discussion: batching benchmark and improvement #164

Discussion: batching benchmark and improvement #164

Comments

vansangpfiev commented Jul 29, 2024

nguyenhoangthuan99 commented Jul 30, 2024 • edited Loading

nguyenhoangthuan99 commented Jul 31, 2024 • edited Loading

nguyenhoangthuan99 commented Jul 30, 2024 •

edited

Loading

nguyenhoangthuan99 commented Jul 31, 2024 •

edited

Loading