Skip to content

Commit 2c01adc

Browse files
committed
ammending triton deployment documentation
1 parent a256e6a commit 2c01adc

File tree

1 file changed

+7
-5
lines changed

1 file changed

+7
-5
lines changed

docsrc/tutorials/deploy_torch_tensorrt_to_triton.rst

+7-5
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ Step 1: Optimize your model with Torch-TensorRT
2020
Most Torch-TensorRT users will be familiar with this step. For the purpose of
2121
this demonstration, we will be using a ResNet50 model from Torchhub.
2222

23-
Let’s first pull the NGC PyTorch Docker container. You may need to create
23+
Let’s first pull the `NGC PyTorch Docker container <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch>`__. You may need to create
2424
an account and get the API key from `here <https://ngc.nvidia.com/setup/>`__.
2525
Sign up and login with your key (follow the instructions
2626
`here <https://ngc.nvidia.com/setup/api-key>`__ after signing up).
@@ -30,7 +30,8 @@ Sign up and login with your key (follow the instructions
3030
# <xx.xx> is the yy:mm for the publishing tag for NVIDIA's Pytorch
3131
# container; eg. 22.04
3232

33-
docker run -it --gpus all -v ${PWD}:/workspace nvcr.io/nvidia/pytorch:<xx.xx>-py3
33+
docker run -it --gpus all -v ${PWD}:/scratch_space nvcr.io/nvidia/pytorch:<xx.xx>-py3
34+
cd /scratch_space
3435

3536
Once inside the container, we can proceed to download a ResNet model from
3637
Torchhub and optimize it with Torch-TensorRT.
@@ -53,7 +54,8 @@ Torchhub and optimize it with Torch-TensorRT.
5354
# Save the model
5455
torch.jit.save(trt_model, "model.pt")
5556

56-
The next step in the process is to set up a Triton Inference Server.
57+
After copying the model, exit the container. The next step in the process
58+
is to set up a Triton Inference Server.
5759

5860
Step 2: Set Up Triton Inference Server
5961
--------------------------------------
@@ -114,15 +116,15 @@ documentation <https://github.com/triton-inference-server/server/blob/main/docs/
114116
for more details.
115117

116118
With the model repository setup, we can proceed to launch the Triton server
117-
with the docker command below.
119+
with the docker command below. Refer `this page <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver>`__ for the pull tag for the container.
118120

119121
::
120122

121123
# Make sure that the TensorRT version in the Triton container
122124
# and TensorRT version in the environment used to optimize the model
123125
# are the same.
124126

125-
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models
127+
docker run --gpus all --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /full/path/to/the_model_repository/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models
126128

127129
This should spin up a Triton Inference server. Next step, building a simple
128130
http client to query the server.

0 commit comments

Comments
 (0)