Skip to content

doc: add doc about development on cloud or runpod #3194

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions docs/source/dev-on-cloud/build-image-to-dockerhub.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
(build-image-to-dockerhub)=

# Build the TensorRT-LLM Docker Image
When you develop trt-llm on cloud platform such as runpod, you may need to provide a docker image for the platform. So you firstly need to upload the image to dockerhub.

## Build the TensorRT-LLM Docker Image and Upload to DockerHub

```bash
make -C docker build
```
Then we can get the docker image named `tensorrt_llm/devel:latest`

### Enable ssh access to the container
Since the default docker image doesn’t have ssh support, we can’t ssh into it. We need to add ssh support to the container.
Let’s first create a new Dockerfile with below content:

```Dockerfile
FROM tensorrt_llm/devel:latest

RUN apt update && apt install openssh-server -y
RUN mkdir -p /run/sshd && chmod 755 /run/sshd
RUN mkdir -p /root/.ssh && chmod 700 /root/.ssh && touch /root/.ssh/authorized_keys && chmod 600 /root/.ssh/authorized_keys
# add sshd to entrypoint script
RUN echo "sshd -E /opt/sshd.log" >> /opt/nvidia/entrypoint.d/99-start-sshd.sh
```

If we save this Dockerfile as `Dockerfile.ssh`. Then we can build the docker image with below command:

```bash
docker build -t tensorrt_llm/devel:with_ssh -f Dockerfile.ssh .
```

Then we can get the docker image named `tensorrt_llm/devel:with_ssh`

## Upload the Docker Image to DockerHub

You need to register a [dockerhub](https://hub.docker.com) account first if you don't have one.

Then you can click 'Personal Access Tokens' in the user menu and create a new token.

With the token, you can login to dockerhub with below command:

```bash
docker login -u <your_dockerhub_username>
```

Enter the token to the console.

After login, you can tag and push the docker image to dockerhub with below command:

```bash
docker tag tensorrt_llm/devel:with_ssh <your_dockerhub_username>/tensorrt_llm:devel
docker push <your_dockerhub_username>/tensorrt_llm:devel
```

Finally, you can see the docker image in your dockerhub repository and can use it with the link such as `docker.io/<your_dockerhub_username>/tensorrt_llm:devel`.
47 changes: 47 additions & 0 deletions docs/source/dev-on-cloud/dev-on-runpod.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
(dev-on-runpod)=

# Develop TensorRT-LLM on Runpod
[Runpod](https://runpod.io) is a popular cloud platform among many researchers. This doc describes how to develop TensorRT-LLM on Runpod.

## Prepare

### Create a Runpod account
Please refer to the [Runpod Getting Started](https://docs.runpod.io/get-started/).

### Configure SSH Key
Please refer to the [Configure SSH Key](https://docs.runpod.io/pods/configuration/use-ssh).

Note that we can skip the step of "Start your Pod. Make sure of the following things" here as we will introduce it below.

## Build the TensorRT-LLM Docker Image and Upload to DockerHub
Please refer to the [Build Image to DockerHub](build-image-to-dockerhub.md).

Note that the docker image must enable ssh access. See on [Enable ssh access to the container](build-image-to-dockerhub.md#enable-ssh-access-to-the-container).

## Create a Pod Template
Click "Template" bottom on the menus and click "Create Template" bottom.

Fill the docker image link of DockerHub such as `docker.io/<your_dockerhub_username>/tensorrt_llm:devel` on "Docker Image" field.

Fill "22" into "Expose TCP Ports" field.

Fill
```bash
sleep infinity
```
into 'Container Start Command' field.

## Connect to the Pod
Please refer to the [Connect to the Pod](https://docs.runpod.io/pods/connect-to-a-pod).

You can connect the pod with SSH or Web Terminal.

If you want to connect the pod with SSH, you can copy the command from "SSH over exposed TCP" field and run it on your host.

In some scenarios such as using a team account, your public key has not been added to the pod successfully. You can directly add this command to the 'Container Start Command' field as:

```bash
bash -c 'echo "<your_public_key>" >> ~/.ssh/authorized_keys;sleep infinity'
```

Enjoy your development!