polygraphy run
allows you to run inference with multiple backends, including TensorRT and ONNX-Runtime, and compare outputs.
While it does provide mechanisms to load and compare against custom outputs from unsupported backends,
adding support for the backend via an extension module allows it to be integrated more seamlessly,
providing a better user experience.
In this example, we'll create an extension module for polygraphy run
called polygraphy_reshape_destroyer
,
which will include the following:
-
A special loader that will replace no-op
Reshape
nodes in an ONNX model withIdentity
nodes. -
A custom runner that supports ONNX models containing only
Identity
nodes. -
Command-line options to:
- Enable or disable renaming nodes when a transformation is applied by the loader.
- Run the model in
slow
,medium
, orfast
mode. Inslow
andmedium
modes, we'll inject atime.sleep()
during inference (this will result in massive performance gains infast
mode!).
Although this example is self-contained and concepts will be explained as you encounter them, it is still
recommended that you first familiarize yourself with
Polygraphy's Loader
and Runner
APIs,
the Argument Group
Interface,
as well as the Script
interface.
After that, creating an extension module for polygraphy run
is a simple matter of defining your
custom Loader
s/Runner
s and Argument Group
s and making them visible to Polygraphy via
setuptools
's entry_points
API.
NOTE: Defining a custom Loader
is not strictly required, but will be covered in this example for the sake of completeness.
As a matter of convention, Polygraphy extension module names are prefixed with polygraphy_
.
We've structured our example extension module such that it somewhat mirrors the structure of the Polygraphy repository. This should make it easier to see the parallels between functionality in the extension module and that provided by Polygraphy natively. The structure is:
- extension_module/
- polygraphy_reshape_destroyer/
- backend/
- __init__.py # Controls submodule-level exports
- loader.py # Defines our custom loader.
- runner.py # Defines our custom runner.
- args/
- __init__.py # Controls submodule-level exports
- loader.py # Defines command-line argument group for our custom loader.
- runner.py # Defines command-line argument group for our custom runner.
- __init__.py # Controls module-level exports
- export.py # Defines the entry-point for `polygraphy run`.
- setup.py # Builds our module
It is recommended that you read these files in the following order:
- backend/loader.py
- backend/runner.py
- backend/__init__.py
- args/loader.py
- args/runner.py
- args/__init__.py
- __init__.py
- export.py
- setup.py
-
Build and install the extension module:
Build using
setup.py
:python3 extension_module/setup.py bdist_wheel
Install the wheel:
python3 -m pip install extension_module/dist/polygraphy_reshape_destroyer-0.0.1-py3-none-any.whl \ --extra-index-url https://pypi.ngc.nvidia.com
TIP: If you make changes to the example extension module, you can update your installed version by rebuilding (by following step 1) and then running:
python3 -m pip install extension_module/dist/polygraphy_reshape_destroyer-0.0.1-py3-none-any.whl \ --force-reinstall --no-deps
-
Once the extension module is installed, you should see the options you added appear in the help output of
polygraphy run
:polygraphy run -h
-
Next, we can try out our custom runner with an ONNX model containing a no-op Reshape:
polygraphy run no_op_reshape.onnx --res-des
-
We can also try some of the other command-line options we added:
-
Renaming replaced nodes:
polygraphy run no_op_reshape.onnx --res-des --res-des-rename-nodes
-
Different inference speeds:
polygraphy run no_op_reshape.onnx --res-des --res-des-speed=slow
polygraphy run no_op_reshape.onnx --res-des --res-des-speed=medium
polygraphy run no_op_reshape.onnx --res-des --res-des-speed=fast
-
-
Lastly, let's compare our implementation against ONNX-Runtime to make sure it is functionally correct:
polygraphy run no_op_reshape.onnx --res-des --onnxrt