-
Notifications
You must be signed in to change notification settings - Fork 347
[no-relnote] Update E2E test suite #1048
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the E2E test suite for the NVIDIA Container Toolkit. Key changes include:
- Updating license headers to use SPDX identifiers.
- Refactoring test files to use a global “runner” variable in place of a locally scoped “r” and adding log messages before running container commands.
- Changing environment variable names for the container image from TOOLKIT_IMAGE to E2E_IMAGE_REPO and introducing E2E_IMAGE_TAG, with corresponding updates in the GitHub workflow.
Reviewed Changes
Copilot reviewed 46 out of 48 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/e2e/runner.go | Updated license header to use SPDX identifiers. |
tests/e2e/nvidia-container-toolkit_test.go | Replaced local runner “r” with global “runner” along with added log messages. |
tests/e2e/installer.go | Updated license header to use SPDX identifiers. |
tests/e2e/e2e_test.go | Refactored env variable usage, introduced ImageRepo/ImageTag, and streamlined test env setup. |
tests/e2e/README.md | New file documenting the E2E suite but the env variable names may need updating. |
.github/workflows/e2e.yaml | Adjusted environment variables for image repository and tag accordingly. |
Files not reviewed (2)
- tests/e2e/Makefile: Language not supported
- tests/go.mod: Language not supported
Comments suppressed due to low confidence (2)
tests/e2e/nvidia-container-toolkit_test.go:59
- The log message duplicates the '--gpus' flag; consider revising it to accurately reflect the command arguments (e.g. 'By("Running docker run with --runtime=nvidia --gpus=all")').
By("Running docker run with --gpus=all --runtime=nvidia --gpus all")
tests/e2e/README.md:62
- The code now uses 'E2E_IMAGE_REPO' and 'E2E_IMAGE_TAG' for the container image instead of 'TOOLKIT_IMAGE'; please update the documentation accordingly.
| `TOOLKIT_IMAGE` | ✔ | `nvcr.io/nvidia/cuda:12.4.0-runtime-ubi9` | Image that will be pulled & executed. |
bf24666
to
f5c3347
Compare
5280561
to
b372840
Compare
PR is ready for review @elezar |
b372840
to
6f07ee9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the end‑to‑end test suite by refactoring test runners, updating copyright headers, and standardizing environment variable usage.
- Updated SPDX headers across test files.
- Replaced inline runner instances with a global runner variable and refactored test commands to use formatted image references.
- Added new test environment setup logic and improved the GitHub Actions workflow.
Reviewed Changes
Copilot reviewed 46 out of 48 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
tests/e2e/runner.go | Updated header information with SPDX details. |
tests/e2e/nvidia-container-toolkit_test.go | Refactored test cases: replaced local runner with global 'runner' and added logging messages. |
tests/e2e/installer.go | Updated header information with SPDX details. |
tests/e2e/e2e_test.go | Introduced comprehensive environment setup and refactored flag usage. |
tests/e2e/README.md | Added documentation for the E2E test suite. |
.github/workflows/e2e.yaml | Updated environment variable usage and test command targets. |
Files not reviewed (2)
- tests/e2e/Makefile: Language not supported
- tests/go.mod: Language not supported
Comments suppressed due to low confidence (1)
tests/e2e/e2e_test.go:149
- [nitpick] The function name getIntEnvVar suggests an integer return type, but it returns a string; consider renaming it for clarity.
func getIntEnvVar(key string, defaultValue int) string {
6f07ee9
to
de1e004
Compare
Looks like this PR has unresolved merge conflicts |
de1e004
to
da7dd89
Compare
Rebased |
ca67999
to
f9792b7
Compare
.github/workflows/ci.yaml
Outdated
@@ -51,3 +51,4 @@ jobs: | |||
uses: ./.github/workflows/e2e.yaml | |||
with: | |||
version: ${{ needs.variables.outputs.version }} | |||
distribution: ubuntu20.04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we're working on #602 I think we should just set the distribution in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
.github/workflows/e2e.yaml
Outdated
@@ -82,8 +85,15 @@ jobs: | |||
chmod 600 "$e2e_ssh_key" | |||
export E2E_SSH_KEY="$e2e_ssh_key" | |||
|
|||
make -f tests/e2e/Makefile test | |||
make -f tests/e2e/Makefile test-e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We're already in the e2e
folder. I think just test
is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests/e2e/Makefile
Outdated
@@ -12,34 +13,14 @@ | |||
# See the License for the specific language governing permissions and | |||
# limitations under the License. | |||
|
|||
GO_CMD ?= go | |||
.PHONY: test-e2e ginkgo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.PHONY: test-e2e ginkgo | |
.PHONY: test ginkgo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests/e2e/Makefile
Outdated
|
||
E2E_RUNTIME ?= docker | ||
ginkgo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be:
ginkgo: | |
ginkgo: $(CURDIR)/bin/ginkgo |
so that we don't ALWAYS install ginkgo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
tests/e2e/Makefile
Outdated
-ssh-user=$(E2E_SSH_USER) \ | ||
-remote-host=$(E2E_SSH_HOST) \ | ||
-remote-port=$(E2E_SSH_PORT) | ||
test-e2e: ginkgo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test-e2e: ginkgo | |
test: ginkgo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
tests/e2e/README.md
Outdated
--- | ||
|
||
## 1 Scope & Goals | ||
This repository contains a **Ginkgo v2 / Gomega** test harness that exercises an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be folder
and not repository
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, agree
tests/e2e/README.md
Outdated
* Environment discovery happens once in `TestMain` → `getTestEnv()`; parameters | ||
are therefore immutable for the duration of the run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like the kind of thing that can get out of sync with the actual implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getTestEnv
is called at Main, not during a BeforeSuite, so it is only run once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the E2E test suite by refactoring the test harness settings, updating license headers to use SPDX tags, and enhancing workflow steps. Key changes include renaming and consolidating environment variables used for runner configuration, refactoring test commands to use a global runner instead of local instances, and updating the GitHub workflow to archive Ginkgo logs.
Reviewed Changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/e2e/runner.go | Updated license header to SPDX format |
tests/e2e/nvidia-container-toolkit_test.go | Refactored runner usage from a local variable to a shared global runner |
tests/e2e/installer.go | Updated license header to SPDX format |
tests/e2e/e2e_test.go | Replaced flag parsing with environment variable validation and added new global configurations |
tests/e2e/README.md | Added detailed documentation for the updated test suite |
.github/workflows/e2e.yaml | Updated environment variable names and added a step to archive logs |
Files not reviewed (2)
- tests/e2e/Makefile: Language not supported
- tests/go.mod: Language not supported
Comments suppressed due to low confidence (1)
tests/e2e/e2e_test.go:155
- [nitpick] The function name getIntEnvVar implies an integer return type; consider renaming it to reflect that it returns a string, or update it to return an int for clarity.
func getIntEnvVar(key string, defaultValue int) string {
f067fdc
to
b58d33b
Compare
tests/e2e/runner.go
Outdated
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
* SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: let's not make these adjustments if they're not required. We don't have consistnt SPDX usage anyway and if we update it in this file, that should be as a single commit / PR that is out of scope fo this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Expect(err).ToNot(HaveOccurred()) | ||
Expect(ldconfigOut).To(ContainSubstring("/usr/local/cuda/compat")) | ||
}) | ||
|
||
It("should NOT work with nvidia-container-runtime-hook", func(ctx context.Context) { | ||
ldconfigOut, _, err := r.Run("docker run --rm -i -e NVIDIA_DISABLE_REQUIRE=true --runtime=runc --gpus all nvcr.io/nvidia/cuda:12.8.0-base-ubi8 bash -c \"ldconfig -p | grep libcuda.so.1\"") | ||
By("Running docker run with -e NVIDIA_DISABLE_REQUIRE=true --gpus all") | ||
ldconfigOut, _, err := runner.Run(fmt.Sprintf("docker run --rm -i -e NVIDIA_DISABLE_REQUIRE=true --runtime=runc --gpus all %s bash -c \"ldconfig -p | grep libcuda.so.1\"", cudaImage)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how I feel about injecting the cudaImage
into the string. This makes the actual command we're running a log more difficult to read. This change also seems orthogonal to what we're trying to acheive in this PR -- which is to align the MECHANICS of the tests with those of the other projects we maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, I'll return to the full string as before.
@@ -198,19 +174,22 @@ var _ = Describe("docker", Ordered, ContinueOnFailure, func() { | |||
}) | |||
|
|||
It("should work with the nvidia runtime in legacy mode", func(ctx context.Context) { | |||
ldconfigOut, _, err := r.Run("docker run --rm -i -e NVIDIA_DISABLE_REQUIRE=true --runtime=nvidia --gpus all nvcr.io/nvidia/cuda:12.8.0-base-ubi8 bash -c \"ldconfig -p | grep libcuda.so.1\"") | |||
By("Running docker run with -e NVIDIA_DISABLE_REQUIRE=true --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we introducing a new context here? The It()
statement above is decribing what we're testing and all this By
statement is doing is repeating the command that we're running below. What value does this add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea of the by
is to be more verbose during each step
------------------------------
docker when Running the cuda-deviceQuery sample should support automatic CDI spec generation
/home/runner/_work/nvidia-container-toolkit/nvidia-container-toolkit/tests/e2e/nvidia-container-toolkit_test.go:130
STEP: Running docker run with --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=runtime.nvidia.com/gpu=all @ 05/06/25 05:40:14.795
• [7.565 seconds]
------------------------------
docker when Running the cuda-deviceQuery sample should support the --gpus flag using the nvidia-container-runtime
/home/runner/_work/nvidia-container-toolkit/nvidia-container-toolkit/tests/e2e/nvidia-container-toolkit_test.go:137
STEP: Running docker run with --runtime=nvidia --gpus all @ 05/06/25 05:40:22.36
• [6.087 seconds]
------------------------------
docker when Running the cuda-deviceQuery sample should support the --gpus flag using the nvidia-container-runtime-hook
/home/runner/_work/nvidia-container-toolkit/nvidia-container-toolkit/tests/e2e/nvidia-container-toolkit_test.go:144
STEP: Running docker run with --gpus all @ 05/06/25 05:40:28.448
• [6.062 seconds]
the STEP
line is the reflect of the By
, I think it makes reading the test logs much easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reading the ginkgo documentation on By()
Generally you should try to keep your Its short and to the point. This is not always possible, however, especially in the context of integration tests that capture complex or lengthy workflows.
I decided to remove this as per our discussion
tests/e2e/installer.go
Outdated
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
* SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment to runner.go
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will return to regular LICENSE header
tests/e2e/e2e_test.go
Outdated
return strconv.Itoa(defaultValue) | ||
} | ||
|
||
return strconv.Itoa(intValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're calling Itoa
three times here -- twice with the same argument -- is there not a better way to implement this. What about splitting this into two functions:
getEnvVarAs[T](string) (T, error)
getEnvVarOrDefault[T](string, T) T
we could then call getEnvVarAs
from getEnvVarOrDefault
and return the default value if we return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
}) | ||
|
||
// getTestEnv gets the test environment variables | ||
func getTestEnv() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make sure that the envvars here match the ones in the README (or remove the readme section).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/e2e/e2e_test.go
Outdated
// TODO (@ArangoGutierrez): | ||
// once https://github.com/NVIDIA/nvidia-container-toolkit/pull/602 | ||
// is merged, remove this | ||
ImageTag = fmt.Sprintf("%s-ubuntu20.04", ImageTag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is NOT what I meant by my comment. We should append this suffix in the YAML file and not here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
tests/e2e/e2e_test.go
Outdated
ImageRepo = os.Getenv("E2E_IMAGE_REPO") | ||
Expect(ImageRepo).NotTo(BeEmpty(), "E2E_IMAGE_REPO environment variable must be set") | ||
|
||
ImageTag = os.Getenv("E2E_IMAGE_TAG") | ||
Expect(ImageTag).NotTo(BeEmpty(), "E2E_IMAGE_TAG environment variable must be set") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are only relevant if E2E_INSTALL_CTK == true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
tests/e2e/e2e_test.go
Outdated
_, _, err := runner.Run("docker pull ubuntu") | ||
Expect(err).ToNot(HaveOccurred()) | ||
|
||
_, _, err = runner.Run(fmt.Sprintf("docker pull %s", vectorAddImage)) | ||
Expect(err).ToNot(HaveOccurred()) | ||
|
||
_, _, err = runner.Run(fmt.Sprintf("docker pull %s", deviceQueryImage)) | ||
Expect(err).ToNot(HaveOccurred()) | ||
_, _, err = runner.Run(fmt.Sprintf("docker pull %s", cudaImage)) | ||
Expect(err).ToNot(HaveOccurred()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a user is only running a subset of tests? The pull of the images should be LOCAL to the tests that use the images. Do you have a motivation for collecting them in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as a general comment: When a developer is writing tests, we want to avoid having them have to remember that they also have to add the pull checks here when adding tests for new functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
b58d33b
to
babdeb1
Compare
babdeb1
to
9defe37
Compare
.gitignore
Outdated
@@ -11,3 +11,4 @@ | |||
/nvidia-ctk | |||
/shared-* | |||
/release-* | |||
/bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: No newline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done
tests/e2e/README.md
Outdated
REMOTE_HOST=10.0.0.15 \ | ||
REMOTE_PORT=22 \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not consistent with the actual implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this section
tests/e2e/README.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove this file from the repo. The Makefile is currently simple enough to act as an example for how to run the tests. (ok that's no longer true, but I don't see value in updating things in two places when we add functionality).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simplified the README
tests/e2e/e2e_test.go
Outdated
cwd string | ||
packagePath string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe these are used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm yeah, that was used in a previos iteration but I ended up removing the need for it, thanks, I have removed it
tests/e2e/e2e_test.go
Outdated
) | ||
sshKey string | ||
sshUser string | ||
host string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR, but we could consider using sshHost
here instead for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
9defe37
to
d239850
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArangoGutierrez please see my proposed changes as a commit on top.
Signed-off-by: Carlos Eduardo Arango Gutierrez <[email protected]> Signed-off-by: Evan Lezar <[email protected]>
4a66b60
to
eb40f24
Compare
No description provided.