You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/deepsparse/benchmark/README.md
+172-2
Original file line number
Diff line number
Diff line change
@@ -14,6 +14,176 @@ See the License for the specific language governing permissions and
14
14
limitations under the License.
15
15
-->
16
16
17
-
#DeepSparse Benchmarking
17
+
## 📜 Benchmarking ONNX Models
18
18
19
-
[Checkout DeepSparse Benchmarking User Guide for usage details](../../../docs/user-guide/deepsparse-benchmarking.md)
19
+
`deepsparse.benchmark` is a command-line (CLI) tool for benchmarking the DeepSparse Engine with ONNX models. The tool will parse the arguments, download/compile the network into the engine, generate input tensors, and execute the model depending on the chosen scenario. By default, it will choose a multi-stream or asynchronous mode to optimize for throughput.
20
+
21
+
### Quickstart
22
+
23
+
After `pip install deepsparse`, the benchmark tool is available on your CLI. For example, to benchmark a dense BERT ONNX model fine-tuned on the SST2 dataset where the model path is the minimum input required to get started, run:
In most cases, good performance will be found in the default options so it can be as simple as running the command with a SparseZoo model stub or your local ONNX model. However, if you prefer to customize benchmarking for your personal use case, you can run `deepsparse.benchmark -h` or with `--help` to view your usage options:
32
+
33
+
CLI Arguments:
34
+
```
35
+
positional arguments:
36
+
37
+
model_path Path to an ONNX model file or SparseZoo model stub.
38
+
39
+
optional arguments:
40
+
41
+
-h, --help show this help message and exit.
42
+
43
+
-b BATCH_SIZE, --batch_size BATCH_SIZE
44
+
The batch size to run the analysis for. Must be
45
+
greater than 0.
46
+
47
+
-shapes INPUT_SHAPES, --input_shapes INPUT_SHAPES
48
+
Override the shapes of the inputs, i.e. -shapes
49
+
"[1,2,3],[4,5,6],[7,8,9]" results in input0=[1,2,3]
50
+
input1=[4,5,6] input2=[7,8,9].
51
+
52
+
-ncores NUM_CORES, --num_cores NUM_CORES
53
+
The number of physical cores to run the analysis on,
54
+
defaults to all physical cores available on the system.
Set by the `--scenario sync` argument, the goal metric is latency per batch (ms/batch). This scenario submits a single inference request at a time to the engine, recording the time taken for a request to return an output. This mimics an edge deployment scenario.
122
+
123
+
The latency value reported is the mean of all latencies recorded during the execution period for the given batch size.
124
+
125
+
#### Asynchronous (Multi-stream) Scenario
126
+
127
+
Set by the `--scenario async` argument, the goal metric is throughput in items per second (i/s). This scenario submits `--num_streams` concurrent inference requests to the engine, recording the time taken for each request to return an output. This mimics a model server or bulk batch deployment scenario.
128
+
129
+
The throughput value reported comes from measuring the number of finished inferences within the execution time and the batch size.
130
+
131
+
#### Example Benchmarking Output of Synchronous vs. Asynchronous
132
+
133
+
**BERT 3-layer FP32 Sparse Throughput**
134
+
135
+
No need to add *scenario* argument since `async` is the default option:
serving, and benchmarking of sparsified image classification models.
6
+
This integration allows for leveraging the DeepSparse Engine to run
7
+
sparsified image classification inference with GPU-class performance directly
8
+
on the CPU.
9
+
10
+
The DeepSparse Engine takes advantage of sparsity within neural networks to
11
+
reduce compute as well as accelerate memory-bound workloads.
12
+
The Engine is particularly effective when leveraging sparsification methods
13
+
such as [pruning](https://neuralmagic.com/blog/pruning-overview/) and
14
+
[quantization](https://arxiv.org/abs/1609.07061). These techniques result in
15
+
significantly more performant and smaller models with limited to no effect on
16
+
the baseline metrics.
17
+
18
+
## Getting Started
19
+
20
+
Before you start your adventure with the DeepSparse Engine, make sure that
21
+
your machine is compatible with our [hardware requirements].
22
+
23
+
### Installation
24
+
25
+
```pip install deepsparse```
26
+
27
+
### Model Format
28
+
29
+
By default, to deploy image classification models using the DeepSparse Engine,
30
+
the model should be supplied in the [ONNX] format.
31
+
This grants the Engine the flexibility to serve any model in a framework-agnostic
32
+
manner.
33
+
34
+
Below we describe two possibilities to obtain the required ONNX model.
35
+
36
+
#### Exporting the onnx file from the contents of a local checkpoint
37
+
38
+
This pathway is relevant if you intend to deploy a model created using [SparseML] library.
39
+
For more information refer to the appropriate integration documentation in [SparseML].
40
+
41
+
1. The output of the [SparseML] training is saved to output directory `/{save_dir}` (e.g. `/trained_model`)
42
+
2. Depending on the chosen framework, the model files are saved to `model_path`=`/{save_dir}/{framework_name}/{model_tag}` (e.g `/trained_model/pytorch/resnet50/`)
43
+
3. To generate an onnx model, refer to the [script for image classification ONNX export](https://github.com/neuralmagic/sparseml/blob/main/src/sparseml/pytorch/image_classification/export.py).
This creates `model.onnx` file, in the parent directory of your `model_path`
54
+
55
+
#### Directly using the SparseZoo stub
56
+
57
+
Alternatively, you can skip the process of onnx model export by downloading all the required model data directly from Neural Magic's [SparseZoo](https://sparsezoo.neuralmagic.com/).
58
+
Example:
59
+
```python
60
+
from sparsezoo import Model
61
+
62
+
# you can lookup an appropriate model stub here: https://sparsezoo.neuralmagic.com/
Original Model Path: zoo:cv/classification/resnet_v1-50/pytorch/sparseml/imagenet/pruned95-none
173
+
Batch Size: 1
174
+
Scenario: async
175
+
Throughput (items/sec): 299.2372
176
+
Latency Mean (ms/batch): 16.6677
177
+
Latency Median (ms/batch): 16.6748
178
+
Latency Std (ms/batch): 0.1728
179
+
Iterations: 2995
180
+
```
181
+
182
+
To learn more about benchmarking, refer to the appropriate documentation.
183
+
Also, check out our [Benchmarking tutorial](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark)!
184
+
185
+
## Tutorials:
186
+
For a deeper dive into using image classification models within the Neural Magic
187
+
ecosystem, refer to the detailed tutorials on our [website](https://neuralmagic.com/):
188
+
-[CV Use Cases](https://neuralmagic.com/use-cases/#computervision)
189
+
190
+
## Support
191
+
For Neural Magic Support, sign up or log in to our [Deep Sparse Community Slack](https://join.slack.com/t/discuss-neuralmagic/shared_invite/zt-q1a1cnvo-YBoICSIw3L1dmQpjBeDurQ). Bugs, feature requests, or additional questions can also be posted to our [GitHub Issue Queue](https://github.com/neuralmagic/deepsparse/issues).
0 commit comments