You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/deepsparse/transformers/README.md
+3-47
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,6 @@ methods such as [pruning](https://neuralmagic.com/blog/pruning-overview/) and [q
10
10
These techniques result in significantly more performant and smaller models with limited to no effect on the baseline metrics.
11
11
12
12
This integration currently supports several fundamental NLP tasks:
13
-
-**Text Generation** - given the input prompt, generate an output text sequence (e.g. to fill in incomplete text or paraphrase part of the prompt)
14
13
-**Question Answering** - posing questions about a document
15
14
-**Sentiment Analysis** - assigning a sentiment to a piece of text
16
15
-**Text Classification** - assigning a label or class to a piece of text (e.g duplicate question pairing)
@@ -31,12 +30,10 @@ compatible with our [hardware requirements](https://docs.neuralmagic.com/deepspa
31
30
By default, to deploy the transformer using DeepSparse Engine it is required to supply the model in the ONNX format along with the HuggingFace supporting files.
32
31
This grants the engine the flexibility to serve any model in a framework-agnostic environment.
33
32
34
-
In general, the DeepSparse pipelines require the following files within a folder on the local server to properly load a Transformers model:
33
+
The DeepSparse pipelines require the following files within a folder on the local server to properly load a Transformers model:
35
34
-`model.onnx`: The exported Transformers model in the [ONNX format](https://github.com/onnx/onnx).
36
-
-`model_kvcache.onnx` (optional): the ONNX model with the KV Cache support (akin to the Transformers model with `use_cache = True`. Specific for the `text-generation` integration.
35
+
-`tokenizer.json`: The [HuggingFace compatible tokenizer configuration](https://huggingface.co/docs/transformers/fast_tokenizers) used with the model.
37
36
-`config.json`: The [HuggingFace compatible configuration file](https://huggingface.co/docs/transformers/main_classes/configuration) used with the model.
38
-
-`tokenizer_config.json`: The [HuggingFace compatible tokenizer configuration](https://huggingface.co/docs/transformers/fast_tokenizers) used with the model.
39
-
-`tokenizer.json`, `special_tokens_map.json`, `vocab.json`, `merges.txt` (optional): Other files that may be required by a tokenizer
40
37
41
38
Below we describe two possibilities to obtain the required structure.
This creates `model.onnx` file, in the directory of your `model_path`(e.g. `/trained_model/model.onnx`).
54
-
Any additional, required files, such as e.g.`tokenizer.json`or`config.json`, are stored under the `model_path` folder as well, so a DeepSparse pipeline ca be directly instantiated by using that folder after export (e.g. `/trained_model/`).
51
+
The `tokenizer.json`and`config.json` are stored under the `model_path` folder as well, so a DeepSparse pipeline ca be directly instantiated by using that folder after export (e.g. `/trained_model/`).
55
52
56
53
#### SparseZoo Stub
57
54
Alternatively, you can skip the process of the ONNX model export by using Neural Magic's [SparseZoo](https://sparsezoo.neuralmagic.com/). The SparseZoo contains pre-sparsified models and SparseZoo stubs enable you to reference any model on the SparseZoo in a convenient and predictable way.
0 commit comments