Skip to content

Commit bcb13c7

Browse files
peri044gs-olive
andcommitted
chore: update perf tooling to add dynamo options (pytorch#2423)
Signed-off-by: Dheeraj Peri <[email protected]> Co-authored-by: George S <[email protected]>
1 parent f1d7771 commit bcb13c7

File tree

11 files changed

+444
-561
lines changed

11 files changed

+444
-561
lines changed

.github/workflows/docker_builder.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ on:
66
branches:
77
- main
88
- nightly
9+
- release/2.1
910

1011
# If pushes to main are made in rapid succession,
1112
# cancel existing docker builds and use newer commits

tools/perf/README.md

Lines changed: 14 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,10 @@
33
This is a comprehensive Python benchmark suite to run perf runs using different supported backends. Following backends are supported:
44

55
1. Torch
6-
2. Torch-TensorRT
7-
3. FX-TRT
8-
4. TensorRT
6+
2. Torch-TensorRT [Torchscript]
7+
3. Torch-TensorRT [Dynamo]
8+
4. Torch-TensorRT [torch_compile]
9+
5. TensorRT
910

1011

1112
Note: Please note that for ONNX models, user can convert the ONNX model to TensorRT serialized engine and then use this package.
@@ -22,9 +23,6 @@ Benchmark scripts depends on following Python packages in addition to requiremen
2223

2324
```
2425
./
25-
├── config
26-
│ ├── vgg16_trt.yml
27-
│ └── vgg16.yml
2826
├── models
2927
├── perf_run.py
3028
├── hub.py
@@ -35,87 +33,20 @@ Benchmark scripts depends on following Python packages in addition to requiremen
3533
```
3634

3735

38-
39-
* `config` - Directory which contains sample yaml configuration files for VGG network.
4036
* `models` - Model directory
41-
* `perf_run.py` - Performance benchmarking script which supports torch, torch_tensorrt, fx2trt, tensorrt backends
37+
* `perf_run.py` - Performance benchmarking script which supports torch, ts_trt, torch_compile, dynamo, tensorrt backends
4238
* `hub.py` - Script to download torchscript models for VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT
4339
* `custom_models.py` - Script which includes custom models other than torchvision and timm (eg: HF BERT)
4440
* `utils.py` - utility functions script
4541
* `benchmark.sh` - This is used for internal performance testing of VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT.
4642

4743
## Usage
4844

49-
There are two ways you can run a performance benchmark.
50-
51-
### Using YAML config files
52-
53-
To run the benchmark for a given configuration file:
54-
55-
```python
56-
python perf_run.py --config=config/vgg16.yml
57-
```
58-
59-
There are two sample configuration files added.
60-
61-
* vgg16.yml demonstrates a configuration with all the supported backends (Torch, Torch-TensorRT, TensorRT)
62-
* vgg16_trt.yml demonstrates how to use an external TensorRT serialized engine file directly.
63-
64-
65-
### Supported fields
66-
67-
| Name | Supported Values | Description |
68-
| ----------------- | ------------------------------------ | ------------------------------------------------------------ |
69-
| backend | all, torchscript, fx2trt, torch, torch_tensorrt, tensorrt | Supported backends for inference. "all" implies the last four methods in the list at left, and "torchscript" implies the last three (excludes fx path) |
70-
| input | - | Input binding names. Expected to list shapes of each input bindings |
71-
| model | - | Configure the model filename and name |
72-
| model_torch | - | Name of torch model file and name (used for fx2trt) (optional) |
73-
| filename | - | Model file name to load from disk. |
74-
| name | - | Model name |
75-
| runtime | - | Runtime configurations |
76-
| device | 0 | Target device ID to run inference. Range depends on available GPUs |
77-
| precision | fp32, fp16 or half, int8 | Target precision to run inference. int8 cannot be used with 'all' backend |
78-
| calibration_cache | - | Calibration cache file expected for torch_tensorrt runtime in int8 precision |
79-
80-
Additional sample use case:
81-
82-
```
83-
backend:
84-
- torch
85-
- torch_tensorrt
86-
- tensorrt
87-
- fx2trt
88-
input:
89-
input0:
90-
- 3
91-
- 224
92-
- 224
93-
num_inputs: 1
94-
model:
95-
filename: model.plan
96-
name: vgg16
97-
model_torch:
98-
filename: model_torch.pt
99-
name: vgg16
100-
runtime:
101-
device: 0
102-
precision:
103-
- fp32
104-
- fp16
105-
```
106-
107-
Note:
108-
109-
1. Please note that measuring INT8 performance is only supported via a `calibration cache` file or QAT mode for `torch_tensorrt` backend.
110-
2. TensorRT engine filename should end with `.plan` otherwise it will be treated as Torchscript module.
111-
112-
### Using CompileSpec options via CLI
113-
11445
Here are the list of `CompileSpec` options that can be provided directly to compile the pytorch module
11546

116-
* `--backends` : Comma separated string of backends. Eg: torch,torch_tensorrt,tensorrt,fx2trt
117-
* `--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in `.plan` extension)). If the backend is `fx2trt`, the input should be a Pytorch module (instead of a torchscript module) and the options for model are (`vgg16` | `resnet50` | `efficientnet_b0`)
118-
* `--model_torch` : Name of the PyTorch model file (optional, only necessary if fx2trt is a chosen backend)
47+
* `--backends` : Comma separated string of backends. Eg: torch, torch_compile, dynamo, tensorrt
48+
* `--model` : Name of the model file (Can be a torchscript module or a tensorrt engine (ending in `.plan` extension)). If the backend is `dynamo` or `torch_compile`, the input should be a Pytorch module (instead of a torchscript module).
49+
* `--model_torch` : Name of the PyTorch model file (optional, only necessary if `dynamo` or `torch_compile` is a chosen backend)
11950
* `--inputs` : List of input shapes & dtypes. Eg: (1, 3, 224, 224)@fp32 for Resnet or (1, 128)@int32;(1, 128)@int32 for BERT
12051
* `--batch_size` : Batch size
12152
* `--precision` : Comma separated list of precisions to build TensorRT engine Eg: fp32,fp16
@@ -131,10 +62,15 @@ Eg:
13162
--model_torch ${MODELS_DIR}/vgg16_torch.pt \
13263
--precision fp32,fp16 --inputs="(1, 3, 224, 224)@fp32" \
13364
--batch_size 1 \
134-
--backends torch,torch_tensorrt,tensorrt,fx2trt \
65+
--backends torch,ts_trt,dynamo,torch_compile,tensorrt \
13566
--report "vgg_perf_bs1.txt"
13667
```
13768

69+
Note:
70+
71+
1. Please note that measuring INT8 performance is only supported via a `calibration cache` file or QAT mode for `torch_tensorrt` backend.
72+
2. TensorRT engine filename should end with `.plan` otherwise it will be treated as Torchscript module.
73+
13874
### Example models
13975

14076
This tool benchmarks any pytorch model or torchscript module. As an example, we provide VGG16, Resnet50, EfficientNet-B0, VIT, HF-BERT models in `hub.py` that we internally test for performance.

tools/perf/benchmark.sh

Lines changed: 58 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -6,62 +6,108 @@ MODELS_DIR="models"
66
python hub.py
77

88
batch_sizes=(1 2 4 8 16 32 64 128 256)
9+
large_model_batch_sizes=(1 2 4 8 16 32 64)
910

10-
#Benchmark VGG16 model
11+
12+
# Benchmark VGG16 model
1113
echo "Benchmarking VGG16 model"
1214
for bs in ${batch_sizes[@]}
1315
do
1416
python perf_run.py --model ${MODELS_DIR}/vgg16_scripted.jit.pt \
15-
--model_torch ${MODELS_DIR}/vgg16_pytorch.pt \
17+
--model_torch vgg16 \
1618
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
1719
--batch_size ${bs} \
18-
--backends torch,torch_tensorrt,tensorrt,fx2trt \
19-
--report "vgg_perf_bs${bs}.txt"
20+
--truncate \
21+
--backends torch,ts_trt,dynamo,torch_compile,inductor \
22+
--report "vgg16_perf_bs${bs}.txt"
23+
done
24+
25+
# Benchmark AlexNet model
26+
echo "Benchmarking AlexNet model"
27+
for bs in ${batch_sizes[@]}
28+
do
29+
python perf_run.py --model ${MODELS_DIR}/alexnet_scripted.jit.pt \
30+
--model_torch alexnet \
31+
--precision fp32,fp16 --inputs="(${bs}, 3, 227, 227)" \
32+
--batch_size ${bs} \
33+
--truncate \
34+
--backends torch,ts_trt,dynamo,torch_compile,inductor \
35+
--report "alexnet_perf_bs${bs}.txt"
2036
done
2137

2238
# Benchmark Resnet50 model
2339
echo "Benchmarking Resnet50 model"
2440
for bs in ${batch_sizes[@]}
2541
do
2642
python perf_run.py --model ${MODELS_DIR}/resnet50_scripted.jit.pt \
27-
--model_torch ${MODELS_DIR}/resnet50_pytorch.pt \
43+
--model_torch resnet50 \
2844
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
2945
--batch_size ${bs} \
30-
--backends torch,torch_tensorrt,tensorrt,fx2trt \
31-
--report "rn50_perf_bs${bs}.txt"
46+
--truncate \
47+
--backends torch,ts_trt,dynamo,torch_compile,inductor \
48+
--report "resnet50_perf_bs${bs}.txt"
3249
done
3350

3451
# Benchmark VIT model
3552
echo "Benchmarking VIT model"
3653
for bs in ${batch_sizes[@]}
3754
do
3855
python perf_run.py --model ${MODELS_DIR}/vit_scripted.jit.pt \
56+
--model_torch vit \
3957
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
4058
--batch_size ${bs} \
41-
--backends torch,torch_tensorrt,tensorrt \
59+
--truncate \
60+
--backends torch,ts_trt,dynamo,torch_compile,inductor \
4261
--report "vit_perf_bs${bs}.txt"
4362
done
4463

64+
# Benchmark VIT Large model
65+
echo "Benchmarking VIT Large model"
66+
for bs in ${large_model_batch_sizes[@]}
67+
do
68+
python perf_run.py --model ${MODELS_DIR}/vit_large_scripted.jit.pt \
69+
--model_torch vit_large \
70+
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
71+
--truncate \
72+
--batch_size ${bs} \
73+
--backends torch,ts_trt,dynamo,torch_compile,inductor \
74+
--report "vit_large_perf_bs${bs}.txt"
75+
done
76+
4577
# Benchmark EfficientNet-B0 model
4678
echo "Benchmarking EfficientNet-B0 model"
4779
for bs in ${batch_sizes[@]}
4880
do
4981
python perf_run.py --model ${MODELS_DIR}/efficientnet_b0_scripted.jit.pt \
50-
--model_torch ${MODELS_DIR}/efficientnet_b0_pytorch.pt \
82+
--model_torch efficientnet_b0 \
5183
--precision fp32,fp16 --inputs="(${bs}, 3, 224, 224)" \
5284
--batch_size ${bs} \
53-
--backends torch,torch_tensorrt,tensorrt,fx2trt \
54-
--report "eff_b0_perf_bs${bs}.txt"
85+
--truncate \
86+
--backends torch,ts_trt,dynamo,torch_compile,inductor \
87+
--report "efficientnet_b0_perf_bs${bs}.txt"
88+
done
89+
90+
# Benchmark Stable Diffusion UNet model
91+
echo "Benchmarking SD UNet model"
92+
for bs in ${large_model_batch_sizes[@]}
93+
do
94+
python perf_run.py --model_torch sd_unet \
95+
--precision fp32,fp16 --inputs="(${bs}, 4, 128, 128)@fp16;(${bs})@fp16;(${bs}, 1, 768)@fp16" \
96+
--batch_size ${bs} \
97+
--backends torch,dynamo,torch_compile,inductor \
98+
--truncate \
99+
--report "sd_unet_perf_bs${bs}.txt"
55100
done
56101

57102
# Benchmark BERT model
58103
echo "Benchmarking Huggingface BERT base model"
59104
for bs in ${batch_sizes[@]}
60105
do
61106
python perf_run.py --model ${MODELS_DIR}/bert_base_uncased_traced.jit.pt \
107+
--model_torch "bert_base_uncased" \
62108
--precision fp32 --inputs="(${bs}, 128)@int32;(${bs}, 128)@int32" \
63109
--batch_size ${bs} \
64-
--backends torch,torch_tensorrt \
110+
--backends torch,ts_trt,dynamo,torch_compile,inductor \
65111
--truncate \
66112
--report "bert_base_perf_bs${bs}.txt"
67113
done

tools/perf/config/vgg16.yml

Lines changed: 0 additions & 19 deletions
This file was deleted.

tools/perf/config/vgg16_trt.yml

Lines changed: 0 additions & 20 deletions
This file was deleted.

tools/perf/custom_models.py

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,18 @@
11
import torch
2-
import torch.nn as nn
3-
from transformers import BertModel, BertTokenizer, BertConfig
4-
import torch.nn.functional as F
52

63

74
def BertModule():
5+
from transformers import BertModel
6+
7+
model_name = "bert-base-uncased"
8+
model = BertModel.from_pretrained(model_name, torchscript=True)
9+
model.eval()
10+
return model
11+
12+
13+
def BertInputs():
14+
from transformers import BertTokenizer
15+
816
model_name = "bert-base-uncased"
917
enc = BertTokenizer.from_pretrained(model_name)
1018
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
@@ -15,16 +23,13 @@ def BertModule():
1523
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
1624
tokens_tensor = torch.tensor([indexed_tokens])
1725
segments_tensors = torch.tensor([segments_ids])
18-
config = BertConfig(
19-
vocab_size_or_config_json_file=32000,
20-
hidden_size=768,
21-
num_hidden_layers=12,
22-
num_attention_heads=12,
23-
intermediate_size=3072,
24-
torchscript=True,
26+
return [tokens_tensor, segments_tensors]
27+
28+
29+
def StableDiffusionUnet():
30+
from diffusers import DiffusionPipeline
31+
32+
pipe = DiffusionPipeline.from_pretrained(
33+
"CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16
2534
)
26-
model = BertModel(config)
27-
model.eval()
28-
model = BertModel.from_pretrained(model_name, torchscript=True)
29-
traced_model = torch.jit.trace(model, [tokens_tensor, segments_tensors])
30-
return traced_model
35+
return pipe.unet

0 commit comments

Comments
 (0)