@@ -7,44 +7,16 @@ Feature: Results
7
7
And a model file tinyllamas/split/stories15M-00001-of-00003.gguf from HF repo ggml-org/models
8
8
And a model file test-model-00001-of-00003.gguf
9
9
And 128 as batch size
10
- And 256 KV cache size
10
+ And 1024 KV cache size
11
11
And 128 max tokens to predict
12
+ And continuous batching
12
13
13
- Scenario Outline : Multi users completion
14
+ Scenario Outline : consistent results with same seed
14
15
Given <n_slots> slots
15
- And continuous batching
16
16
Then the server is starting
17
17
Then the server is healthy
18
18
19
- Given 42 as seed
20
- And a prompt:
21
- """
22
- Write a very long story about AI.
23
- """
24
-
25
- Given 42 as seed
26
- And a prompt:
27
- """
28
- Write a very long story about AI.
29
- """
30
-
31
- Given 42 as seed
32
- And a prompt:
33
- """
34
- Write a very long story about AI.
35
- """
36
-
37
- Given 42 as seed
38
- And a prompt:
39
- """
40
- Write a very long story about AI.
41
- """
42
-
43
- Given 42 as seed
44
- And a prompt:
45
- """
46
- Write a very long story about AI.
47
- """
19
+ Given 4 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 42
48
20
49
21
Given concurrent completion requests
50
22
Then the server is busy
@@ -55,3 +27,55 @@ Feature: Results
55
27
| n_slots |
56
28
| 1 |
57
29
| 2 |
30
+
31
+ Scenario Outline : different results with different seed
32
+ Given <n_slots> slots
33
+ Then the server is starting
34
+ Then the server is healthy
35
+
36
+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 42
37
+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 43
38
+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 44
39
+ Given 1 prompts "Title: Little Red Riding Hood But In Space\n\n Summary:" with seed 45
40
+
41
+ Given concurrent completion requests
42
+ Then the server is busy
43
+ Then the server is idle
44
+ And all slots are idle
45
+ Then all predictions are different
46
+ Examples :
47
+ | n_slots |
48
+ | 1 |
49
+ | 2 |
50
+
51
+ Scenario Outline : consistent results with same seed and varying batch size
52
+ Given 4 slots
53
+ And <temp> temperature
54
+ # And 0 as draft
55
+ Then the server is starting
56
+ Then the server is healthy
57
+
58
+ Given 1 prompts "Write a very long story about AI." with seed 42
59
+ And concurrent completion requests
60
+ # Then the server is busy # Not all slots will be utilized.
61
+ Then the server is idle
62
+ And all slots are idle
63
+
64
+ Given <n_parallel> prompts "Write a very long story about AI." with seed 42
65
+ And concurrent completion requests
66
+ # Then the server is busy # Not all slots will be utilized.
67
+ Then the server is idle
68
+ And all slots are idle
69
+
70
+ Then all predictions are equal
71
+ Examples :
72
+ | n_parallel | temp |
73
+ | 1 | 0 .0 |
74
+ | 2 | 0 .0 |
75
+ | 4 | 0 .0 |
76
+ | 1 | 1 .0 |
77
+ # FIXME: These tests fail on master. The problem seems to be the unified KV cache.
78
+ # See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
79
+ # and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574 .
80
+ # | 2 | 1.0 |
81
+ # | 4 | 1.0 |
0 commit comments