Skip to content

Commit 6417b54

Browse files
authored
[TextGeneration] Add new text_generation.md with examples and tables with generation_config attributes (#1284)
* add text_geneation feature readme table * update table * move to new file, add in different inference runs * add more examples * remove confusing prints
1 parent dcd810b commit 6417b54

File tree

2 files changed

+202
-0
lines changed

2 files changed

+202
-0
lines changed

src/deepsparse/transformers/README.md

+1
Original file line numberDiff line numberDiff line change
@@ -354,6 +354,7 @@ deepsparse.benchmark zoo:nlp/question_answering/bert-base/pytorch/huggingface/sq
354354
To learn more about benchmarking, refer to the appropriate documentation.
355355
Also, check out our [Benchmarking tutorial](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark)!
356356

357+
357358
## Tutorials:
358359
For a deeper dive into using transformers within the Neural Magic ecosystem, refer to the detailed tutorials on our [website](https://neuralmagic.com/):
359360
- [Token Classification: Named Entity Recognition](https://neuralmagic.com/use-cases/sparse-named-entity-recognition/)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# Pipeline Creation
2+
3+
```python
4+
from deepsparse import TextGeneration
5+
6+
MODEL_PATH = "path/to/model/or/zoostub"
7+
text_pipeline = TextGeneration(model_path=MODEL_PATH)
8+
9+
```
10+
11+
# Inference Runs
12+
13+
```python
14+
PROMPT = "how are you?"
15+
SECOND_PROMPT = "what book is really popular right now?"
16+
```
17+
18+
### All defaults
19+
```python
20+
text_result = text_pipeline(prompt=PROMPT)
21+
```
22+
23+
### Enable Streaming
24+
```python
25+
generations = text_pipeline(prompt=PROMPT, streaming=True)
26+
for text_generation in generations:
27+
print(text_generation)
28+
```
29+
30+
### Multiple Inputs
31+
```python
32+
PROMPTS = [PROMPT, SECOND_PROMPT]
33+
text_output = text_pipeline(prompt=PROMPTS)
34+
35+
prompt_output = text_output.generations[0]
36+
second_prompt_output = text_output.generations[1]
37+
```
38+
39+
### Use `generation_config` to control the generated results
40+
- Limit the generated output size using the `max_length` property
41+
- For a complete list of supported attributes, see the tables below
42+
43+
```python
44+
generation_config = {"max_length": 10}
45+
generations = text_pipeline(prompt=PROMPT, generation_config=generation_config)
46+
```
47+
48+
### Use the transformers `GenerationConfig` object for the `generation_config`
49+
50+
```python
51+
from transformers import GenerationConfig
52+
53+
generation_config = GenerationConfig()
54+
generation_config.max_length = 10
55+
56+
generations = text_pipeline(prompt=PROMPT, generation_config=generation_config)
57+
```
58+
59+
### Use just `kwargs`
60+
- The attributes supported through the `generation_config` are also supported through
61+
`kwargs`
62+
63+
```python
64+
generations = text_pipeline(prompt=PROMPT, max_length=10)
65+
```
66+
### Use the GenerationConfig during pipeline creation
67+
- Every inference run with this pipeline will apply this generation config, unless
68+
also provided during inference
69+
70+
```python
71+
MODEL_PATH = "path/to/model/or/zoostub"
72+
generation_config = {"max_length": 10}
73+
text_pipeline = TextGeneration(model_path=MODEL_PATH, generation_config=generation_config)
74+
75+
generations = text_pipeline(prompt=PROMPT)
76+
77+
# Override the generation config by providing a config during inference time
78+
generation_config = {"max_length": 25}
79+
generations = text_pipeline(prompt=PROMPT, generation_config=generation_config)
80+
```
81+
82+
### Get more then one response for a given prompt
83+
84+
```python
85+
generation_config = {"num_return_sequences": 2}
86+
generations = text_pipeline(prompt=PROMPT, generation_config=generation_config)
87+
```
88+
89+
### Get more than one unique response
90+
91+
```python
92+
generation_config = {"num_return_sequences": 2, "do_sample": True}
93+
generations = text_pipeline(prompt=PROMPT, generation_config=generation_config)
94+
```
95+
96+
### Use multiple prompts and generate multiple outputs for each prompt
97+
98+
```python
99+
PROMPTS = [PROMPT, SECOND_PROMPT]
100+
101+
generations = text_pipeline(prompt=PROMPTS, num_return_sequences=2, do_sample=True, max_length=100)
102+
prompt_outputs = text_output.generations[0]
103+
second_prompt_outputs = text_output.generations[1]
104+
105+
print("Outputs from the first prompt: ")
106+
for output in prompt_outputs:
107+
print(output)
108+
print("\n")
109+
110+
print("Outputs from the second prompt: ")
111+
for output in second_prompt_outputs:
112+
print(output)
113+
print("\n")
114+
```
115+
116+
Output:
117+
```
118+
Outputs from the first prompt:
119+
text=" are you coping better with holidays?\nI'm been reall getting good friends and helping friends as much as i can so it's all good." score=None finished=True finished_reason='stop'
120+
121+
text="\nI'm good... minor panic attacks but aside from that I'm good." score=None finished=True finished_reason='stop'
122+
123+
Outputs from the second prompt:
124+
text='\nHAVING A GOOD TIME by Maya Angelou; How to Be a Winner by Peter Enns; BE CAREFUL WHAT YOU WHORE FOR by Sarah Bergman; 18: The Basic Ingredients of a Good Life by Jack Canfield.\nI think you might also read The Sympathy of the earth by Charles Darwin, if you are not interested in reading books. Do you write? I think it will help you to refine your own writing.' score=None finished=True finished_reason='stop'
125+
126+
text=' every school or publication I have looked at has said the same two books.\nIt depends on the school/master. AIS was the New York Times Bestseller forever, kicked an ass in the teen fiction genre for many reasons, a lot of fiction picks like that have been around a while hence popularity. And most science fiction and fantasy titles (but not romance or thriller) are still popular.' score=None finished=True finished_reason='stop'
127+
```
128+
129+
130+
### Output scores
131+
132+
```python
133+
generations = text_pipeline(prompt=PROMPT, output_score=True)
134+
```
135+
136+
<h1><summary>Text Generation GenerationConfig Features Supported </h1></summary>
137+
138+
139+
<h2> Parameters controlling the output length: </h2>
140+
141+
| Feature | Description | Deepsparse Default | HuggingFace Default | Supported |
142+
| :--- | :----: | :----: | :----: | ---:|
143+
| max_length | Maximum length of generated tokens. Equal to input_prompt + max_new_tokens. Overridden by max_new_tokens | 1024 | 20 | Yes|
144+
| max_new_tokens | Maximum number of tokens to generate, ignoring prompt tokens. | None | None | Yes |
145+
| min_length | Minimum length of generated tokens. Equal to input_prompt + min_new_tokens. Overridden by min_new_tokens | - | 0 | No
146+
| min_new_tokens | Minomum number of tokens to generate, ignoring prompt tokens. | - | None | No |
147+
| max_time | - | - | - | No |
148+
149+
<br/>
150+
<h2> Parameters for manipulation of the model output logits </h2>
151+
152+
| Feature | Description | Deepsparse Default | HuggingFace Default | Supported |
153+
| :--- | :----: | :----: | :----: | ---:|
154+
| top_k | The number of highest probability vocabulary tokens to keep for top-k-filtering | 0 | 50 | Yes
155+
| top_p | Keep the generated tokens where its cumulative probability is >= top_p | 0.0 | 1.0 | Yes
156+
| repetition_penalty | Penalty applied for generating new token. Existing token frequencies summed to subtraction the logit of its corresponding logit value | 0.0 | 1.0 | Yes |
157+
| temperature | The temperature to use when sampling from the probability distribution computed from the logits. Higher values will result in more random samples. Should be greater than 0.0 | 1.0 | 1.0 | Yes |
158+
| typical_p | - | - | - | No |
159+
| epsilon_cutoff | - | - | - | No |
160+
| eta_cutoff | - | - | - | No |
161+
| diversity_penalty | - | - | - | No |
162+
| length_penalty | - | - | - | No |
163+
| bad_words_ids | - | - | - | No |
164+
| force_words_ids | - | - | - | No |
165+
| renormalize_logits | - | - | - | No |
166+
| constraints | - | - | - | No |
167+
| forced_bos_token_id | - | - | - | No |
168+
| forced_eos_token_id | - | - | - | No |
169+
| remove_invalid_values | - | - | - | No |
170+
| exponential_decay_length_penalty | - | - | - | No |
171+
| suppress_tokens | - | - | - | No |
172+
| begin_suppress_tokens | - | - | - | No |
173+
| forced_decoder_ids | - | - | - | No |
174+
175+
<br/>
176+
<h2> Parameters that control the generation strategy used </h2>
177+
178+
| Feature | Description | Deepsparse Default | HuggingFace Default | Supported |
179+
| :--- | :----: | :----: | :----: | ---:|
180+
| do_sample | If True, will apply sampling from the probability distribution computed from the logits | False | False | Yes |
181+
182+
<br/>
183+
<h2> Parameters for output variables: </h2>
184+
185+
| Feature | Description | Deepsparse Default | HuggingFace Default | Supported |
186+
| :--- | :----: | :----: | :----: | ---:|
187+
| num_return_sequences | The number of sequences generated for each prompt | 1 | 1 | Yes |
188+
| output_scores | Whether to return the generated logits | False | False | Yes |
189+
| return_dict_generate | - | - | - | No |
190+
191+
<br/>
192+
<h2> Special Tokens: </h2>
193+
194+
| Feature | Description | Deepsparse Default | HuggingFace Default | Supported |
195+
| :--- | :----: | :----: | :----: | ---:|
196+
| pad_token_id | - | - | - | No |
197+
| bos_token_id | - | - | - | No |
198+
| eos_token_id | - | - | - | No |
199+
200+
201+
<br/>

0 commit comments

Comments
 (0)