Skip to content

Commit 9ff8d4d

Browse files
Server: add tests for batch size, different seeds
1 parent 4dba7e8 commit 9ff8d4d

File tree

2 files changed

+156
-80
lines changed

2 files changed

+156
-80
lines changed

examples/server/tests/features/results.feature

+56-32
Original file line numberDiff line numberDiff line change
@@ -7,44 +7,16 @@ Feature: Results
77
And a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
88
And a model file test-model-00001-of-00003.gguf
99
And 128 as batch size
10-
And 256 KV cache size
10+
And 1024 KV cache size
1111
And 128 max tokens to predict
12+
And continuous batching
1213

13-
Scenario Outline: Multi users completion
14+
Scenario Outline: consistent results with same seed
1415
Given <n_slots> slots
15-
And continuous batching
1616
Then the server is starting
1717
Then the server is healthy
1818

19-
Given 42 as seed
20-
And a prompt:
21-
"""
22-
Write a very long story about AI.
23-
"""
24-
25-
Given 42 as seed
26-
And a prompt:
27-
"""
28-
Write a very long story about AI.
29-
"""
30-
31-
Given 42 as seed
32-
And a prompt:
33-
"""
34-
Write a very long story about AI.
35-
"""
36-
37-
Given 42 as seed
38-
And a prompt:
39-
"""
40-
Write a very long story about AI.
41-
"""
42-
43-
Given 42 as seed
44-
And a prompt:
45-
"""
46-
Write a very long story about AI.
47-
"""
19+
Given 4 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42
4820

4921
Given concurrent completion requests
5022
Then the server is busy
@@ -55,3 +27,55 @@ Feature: Results
5527
| n_slots |
5628
| 1 |
5729
| 2 |
30+
31+
Scenario Outline: different results with different seed
32+
Given <n_slots> slots
33+
Then the server is starting
34+
Then the server is healthy
35+
36+
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 42
37+
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 43
38+
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 44
39+
Given 1 prompts "Title: Little Red Riding Hood But In Space\n\nSummary:" with seed 45
40+
41+
Given concurrent completion requests
42+
Then the server is busy
43+
Then the server is idle
44+
And all slots are idle
45+
Then all predictions are different
46+
Examples:
47+
| n_slots |
48+
| 1 |
49+
| 2 |
50+
51+
Scenario Outline: consistent results with same seed and varying batch size
52+
Given 4 slots
53+
And <temp> temperature
54+
# And 0 as draft
55+
Then the server is starting
56+
Then the server is healthy
57+
58+
Given 1 prompts "Write a very long story about AI." with seed 42
59+
And concurrent completion requests
60+
# Then the server is busy # Not all slots will be utilized.
61+
Then the server is idle
62+
And all slots are idle
63+
64+
Given <n_parallel> prompts "Write a very long story about AI." with seed 42
65+
And concurrent completion requests
66+
# Then the server is busy # Not all slots will be utilized.
67+
Then the server is idle
68+
And all slots are idle
69+
70+
Then all predictions are equal
71+
Examples:
72+
| n_parallel | temp |
73+
| 1 | 0.0 |
74+
| 2 | 0.0 |
75+
| 4 | 0.0 |
76+
| 1 | 1.0 |
77+
# FIXME: These tests fail on master. The problem seems to be the unified KV cache.
78+
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
79+
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 .
80+
# | 2 | 1.0 |
81+
# | 4 | 1.0 |

0 commit comments

Comments
 (0)