Skip to content

Commit 5b33261

Browse files
author
Sara Adkins
authored
Replace Dead SparseZoo Stubs in Documentation (#1279)
* Fixing dead benchmark readme stubs * Update transformers readme * Update resnet50_benchmark.py * style
1 parent c3b313c commit 5b33261

File tree

3 files changed

+22
-22
lines changed

3 files changed

+22
-22
lines changed

examples/benchmark/resnet50_benchmark.py

+13-13
Original file line numberDiff line numberDiff line change
@@ -123,52 +123,52 @@ def main():
123123
results = benchmark_model(
124124
(
125125
"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/"
126-
"pruned-conservative"
126+
"pruned80_quant-none-vnni"
127127
),
128128
sample_inputs,
129129
batch_size=batch_size,
130130
num_cores=num_cores,
131131
num_iterations=num_iterations,
132132
num_warmup_iterations=num_warmup_iterations,
133133
)
134-
print(f"ResNet-50 v1 Pruned Conservative FP32 {results}")
134+
print(f"ResNet-50 v1 Pruned 80 INT8 {results}")
135+
136+
if not VNNI:
137+
print(
138+
"WARNING: VNNI instructions not detected, "
139+
"quantization (INT8) speedup not well supported"
140+
)
135141

136142
results = benchmark_model(
137143
(
138144
"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/"
139-
"pruned-moderate"
145+
"pruned90-none"
140146
),
141147
sample_inputs,
142148
batch_size=batch_size,
143149
num_cores=num_cores,
144150
num_iterations=num_iterations,
145151
num_warmup_iterations=num_warmup_iterations,
146152
)
147-
print(f"ResNet-50 v1 Pruned Moderate FP32 {results}")
148-
149-
if not VNNI:
150-
print(
151-
"WARNING: VNNI instructions not detected, "
152-
"quantization (INT8) speedup not well supported"
153-
)
153+
print(f"ResNet-50 v1 Pruned 90 FP32 {results}")
154154

155155
results = benchmark_model(
156156
(
157157
"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/"
158-
"pruned_quant-moderate"
158+
"pruned90_quant-none"
159159
),
160160
sample_inputs,
161161
batch_size=batch_size,
162162
num_cores=num_cores,
163163
num_iterations=num_iterations,
164164
num_warmup_iterations=num_warmup_iterations,
165165
)
166-
print(f"ResNet-50 v1 Pruned Moderate INT8 {results}")
166+
print(f"ResNet-50 v1 Pruned 90 INT8 {results}")
167167

168168
results = benchmark_model(
169169
(
170170
"zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/"
171-
"pruned95_quant-none"
171+
"pruned95_uniform_quant-none"
172172
),
173173
sample_inputs,
174174
batch_size=batch_size,

src/deepsparse/benchmark/README.md

+6-6
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,10 @@ limitations under the License.
2020

2121
### Quickstart
2222

23-
After `pip install deepsparse`, the benchmark tool is available on your CLI. For example, to benchmark a dense BERT ONNX model fine-tuned on the SST2 dataset where the model path is the minimum input required to get started, run:
23+
After `pip install deepsparse`, the benchmark tool is available on your CLI. For example, to benchmark a dense BERT ONNX model fine-tuned on the MNLI dataset where the model path is the minimum input required to get started, run:
2424

2525
```
26-
deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none
26+
deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/mnli/base-none
2727
```
2828
__ __
2929
### Usage
@@ -94,7 +94,7 @@ optional arguments:
9494
Example CLI command for benchmarking an ONNX model from the SparseZoo and saving the results to a `benchmark.json` file:
9595

9696
```
97-
deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/sst2/base-none -x benchmark.json
97+
deepsparse.benchmark zoo:nlp/text_classification/bert-base/pytorch/huggingface/mnli/base-none -x benchmark.json
9898
```
9999
Output of the JSON file:
100100

@@ -108,10 +108,10 @@ To run a sparse FP32 MobileNetV1 at batch size 16 for 10 seconds for throughput
108108
deepsparse.benchmark zoo:cv/classification/mobilenet_v1-1.0/pytorch/sparseml/imagenet/pruned-moderate --batch_size 16 --time 10 --scenario async --num_streams 8
109109
```
110110

111-
To run a sparse quantized INT8 6-layer BERT at batch size 1 for latency:
111+
To run a sparse quantized INT8 BERT at batch size 1 for latency:
112112

113113
```
114-
deepsparse.benchmark zoo:nlp/question_answering/bert-base/pytorch/huggingface/squad/pruned_quant_6layers-aggressive_96 --batch_size 1 --scenario sync
114+
deepsparse.benchmark zoo:nlp/question_answering/bert-large/pytorch/huggingface/squad/pruned90_quant-none --batch_size 1 --scenario sync
115115
```
116116
__ __
117117
### ⚡ Inference Scenarios
@@ -341,4 +341,4 @@ Mean Latency Breakdown (ms/batch):
341341
engine_prompt_prefill_single: 19.0412
342342
engine_token_generation: 19603.0353
343343
engine_token_generation_single: 19.1170
344-
```
344+
```

src/deepsparse/transformers/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Hugging Face Transformer Inference Pipelines
1+
x# Hugging Face Transformer Inference Pipelines
22

33

44
DeepSparse allows accelerated inference, serving, and benchmarking of sparsified [Hugging Face Transformer](https://github.com/huggingface/transformers) models.
@@ -208,7 +208,7 @@ Spinning up:
208208
```bash
209209
deepsparse.server \
210210
task sentiment-analysis \
211-
--model_path "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/12layer_pruned80_quant-none-vnni"
211+
--model_path "zoo:nlp/sentiment_analysis/bert-base/pytorch/huggingface/sst2/pruned80_quant-none-vnni"
212212
```
213213

214214
Making a request:
@@ -314,7 +314,7 @@ Spinning up:
314314
```bash
315315
deepsparse.server \
316316
task token-classification \
317-
--model_path "zoo:nlp/token_classification/bert-base/pytorch/huggingface/conll2003/12layer_pruned80_quant-none-vnni"
317+
--model_path "zoo:nlp/token_classification/bert-base/pytorch/huggingface/conll2003/pruned90-none"
318318
```
319319

320320
Making a request:

0 commit comments

Comments
 (0)