You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* initial commit
* coreys simplifications
* finishing the second model static
* ready, time for beautification
* ready for review
* moved the code to examples
* fix eos logic
* add argument num_tokens_to_generate
* initial commit
* change order
* Update examples/codegen/README.md
Co-authored-by: corey-nm <[email protected]>
---------
Co-authored-by: corey-nm <[email protected]>
We need to do this because the existing with_past implementations assume there is no padding in the inputs. With deepsparse, we need to use static sequence length, which means our offset for the embeddings will depend on how many non-padded inputs we receive.
33
+
34
+
The new line checks this with the attention_mask. At this point in the code, attention_mask has been transformed from a tensor with 0s and 1s, to a tensor of `float.min` and `0.0`. So when we compare `attention_mask == 0.0` we are actually saying everywhere the attention_mask is 1.
35
+
36
+
We also need to subtract 1 from this count, because the attention mask is applied AFTER the kv cache is concatenated to the new token, which means the attention mask will actually be sequence length + 1 items. So we subtract 1 to get the current sequence length.
out = codegen(sequences=["def hello_world():", "def fibonacci(x):"])
28
-
for seq in out.sequences:
29
-
print(seq)
30
-
```
95
+
Modifying pipeline behaviour:
96
+
1. By adding argument `deterministic=False`, the next token of the sequence will not be chosen deterministically (using argmax), but will be
97
+
sampled from the probablility distribution.
98
+
2. By setting `sampling_temperature` when `deterministic=False`, we are allowing more or less randomness in the sampling method (https://towardsdatascience.com/how-to-sample-from-language-models-682bceb97277)
99
+
3. By setting `num_tokens_to_generate`, we strictly specify how many tokens we want to generate per input.
0 commit comments