Skip to content

fix: clean up iptables rules for stopped containers #4255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

fahedouch
Copy link
Member

@fahedouch fahedouch commented May 16, 2025

fixing this issue #1872 or at least a part of the issue

Based on the runtime-spec documentation, the post-hook is executed after the container task is deleted. This ensures that resources are properly cleaned up when the container is removed, allowing sufficient iptables space for the next container to be created.

I think this should replace #4254 ( cc @apostasie)

@fahedouch fahedouch changed the title fix: clean up iptables rules for stopped containers [WIP] fix: clean up iptables rules for stopped containers May 16, 2025
@apostasie
Copy link
Contributor

It does not feel like this would fix #4253 or any of the same ilk, or will it?

@apostasie
Copy link
Contributor

apostasie commented May 16, 2025

Also be sure to check out: #3357

In case of failure during onCreateRuntime, onPostStop is either wrongly dispatched while the container is not deleted, or the container should have been deleted but is not.
While this might be a bug in containerd (or nerdctl?), or otherwise some unpolished corner of the oci spec, the end-result is the same: we need to prevent onPostStop from proceeding further as the container is not destroyed (this is why we have shouldExit).

Your post-cleanup here (while useful for "normal" exits) will not work in case of failures onCreateRuntime because of ^

@apostasie
Copy link
Contributor

Suggestion is:

  1. isolate cni in its own file, and at least provide a Remove method that incorporates your post-cleanup here
  2. incorporate [FIX]: ensure cleanup of cni config in case of failure in createRuntime #4254 into this patch here, using the updated Remove method

Then we might want to have a look overall at what we do in oci hooks.
There are definitely multiple issues (maybe some outside of nerdctl) - the whole thing is untested, and the code does not look too good...

@fahedouch
Copy link
Member Author

fahedouch commented May 17, 2025

Suggestion is:

  1. isolate cni in its own file, and at least provide a Remove method that incorporates your post-cleanup here
  2. incorporate [FIX]: ensure cleanup of cni config in case of failure in createRuntime #4254 into this patch here, using the updated Remove method

Then we might want to have a look overall at what we do in oci hooks. There are definitely multiple issues (maybe some outside of nerdctl) - the whole thing is untested, and the code does not look too good...

SGTM,
Indeed, OCI operations, especially creation, should be idempotent.

  • Regarding your first point, I agree. However, I'd like to wait for the CNI fix related to purging iptable rules with empty netNs path before refactoring this part, as my function will be affected in that case.
  • For your second point, I agree and will cherry-pick your commit. There's no need for a separate PR for this.

@apostasie
Copy link
Contributor

  • Regarding your first point, I agree. However, I'd like to wait for the CNI fix related to purging iptable rules with empty netNs path before refactoring this part, as my function will be affected in that case.

The original PR has been opened over two years ago.
I don't mean to be negative nor judgmental - what I am saying is that we might prefer to act now, assuming it is not going to be merged anytime soon, and revert later when/if it happens.

@fahedouch fahedouch changed the title [WIP] fix: clean up iptables rules for stopped containers fix: clean up iptables rules for stopped containers May 20, 2025
@fahedouch fahedouch force-pushed the clean-up-iptables-in-oci-post-hook branch 3 times, most recently from b05e3af to 06bc982 Compare May 20, 2025 14:23
@fahedouch fahedouch marked this pull request as draft May 20, 2025 16:05
@fahedouch fahedouch force-pushed the clean-up-iptables-in-oci-post-hook branch 2 times, most recently from e4d1795 to 20e3c2f Compare May 22, 2025 08:37
@fahedouch fahedouch marked this pull request as ready for review May 22, 2025 08:38
@fahedouch fahedouch force-pushed the clean-up-iptables-in-oci-post-hook branch 3 times, most recently from d01cafd to 5eca797 Compare May 24, 2025 20:39
@fahedouch fahedouch added this to the v2.1.3 milestone May 24, 2025
@fahedouch fahedouch requested a review from AkihiroSuda May 24, 2025 22:32
// TestContainerRmIptables tests that iptables rules are cleared after container deletion
func TestContainerRmIptables(t *testing.T) {
if _, err := exec.LookPath("iptables"); err != nil {
t.Skip("iptables command not found, skipping test")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testCase.Require should be used to require the command

testCase := nerdtest.Setup()

// This test requires Linux
testCase.Require = require.Not(require.Windows)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is meaningless on Docker?

}

// In non-rootless mode, check iptables rules directly on the host
return helpers.Custom("sh", "-c", "iptables -t nat -S && iptables -t filter -S && iptables -t mangle -S")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The shell script should be defined as a constant to avoid the code clone across rootless and rootful

@fahedouch fahedouch force-pushed the clean-up-iptables-in-oci-post-hook branch 2 times, most recently from 220d43d to be7756c Compare May 26, 2025 14:48
@fahedouch fahedouch requested a review from AkihiroSuda May 26, 2025 16:02
@fahedouch fahedouch force-pushed the clean-up-iptables-in-oci-post-hook branch 2 times, most recently from 99d989e to 1176445 Compare May 28, 2025 15:41
@fahedouch
Copy link
Member Author

@AkihiroSuda @apostasie PTAL

Description: "Test iptables rules are cleared after container deletion",
Setup: func(data test.Data, helpers test.Helpers) {
// Create a container with port mapping to ensure iptables rules are created
helpers.Ensure("run", "-d", "--name", data.Identifier(), "-p", "8080:80", testutil.NginxAlpineImage)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8080 may (is) shared by many compose tests, so, presents a risk of colliding port between parallelized tests (once we do parallelize more of them).
port, err = portlock.Acquire(0) and (during cleanup) portlock.Release would help ensuring we get a free port (since we are not specifically committed to 8080).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

container := nerdtest.InspectContainer(helpers, data.Identifier())
containerID = container.ID
data.Labels().Set("containerID", containerID)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just move this section above to Setup, just after container creation - then you would not need to test that the label is empty. I would even call helpers.Capture instead of helpers.Ensure, so that you get the id directly instead of having to Inspect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -479,6 +480,15 @@ func applyNetworkSettings(opts *handlerOpts) error {
if err != nil {
return fmt.Errorf("failed to call cni.Setup: %w", err)
}

// Defer CNI configuration removal to ensure idempotency of oci-hook.
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am now wondering if this should be moved above opts.cni.Setup? (eg: would there be conditions where cni setup would fail and leave partially configured network above?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@apostasie
Copy link
Contributor

@fahedouch couple of non critical nits:

  • we can ensure the port is free to avoid future collisions instead of using 8080
  • minor comments on testing

Anyhow, LGTM - let's have this in, it is a big deal.

t.Fatalf("Failed to get detached network namespace: %v", err)
} else {
if netns != "" {
// First enter the user and mount namespace, then the network namespace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably you can use containerd-rootless-setuptool.sh nsenter -- nsenter --net=... to simplify the code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


// Construct the path to the network namespace
uid := os.Getuid()
netnsPath := fmt.Sprintf("/run/user/%d/containerd-rootless/netns", uid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable is redundant. The path is provided in the netns string below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@fahedouch fahedouch force-pushed the clean-up-iptables-in-oci-post-hook branch from 1176445 to 3cea6c8 Compare June 2, 2025 16:03
if netns != "" {
// Use containerd-rootless-setuptool.sh to enter the RootlessKit namespace
return helpers.Custom("containerd-rootless-setuptool.sh", "nsenter", "--", "sh", "-c",
fmt.Sprintf("nsenter --net=%s sh -c '%s'", netns, iptablesCheckCommand))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use sh -c twice.
The first one can be replaced with "c-r-s.sh", "nsenter", "--", "nsenter", "--net="+netns, ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also probably sh -c should be sh -ec

for _, rule := range rules {
if strings.Contains(rule, containerID) {
// Execute delete command
deleteCmd := exec.Command("sh", "-c", "--", fmt.Sprintf(`iptables -t %s -D %s`, table, rule[3:]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to use sh -c

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to pass string cmd with sh -c to avoid escaping issues. see #4255 (comment)

@fahedouch fahedouch force-pushed the clean-up-iptables-in-oci-post-hook branch from 3cea6c8 to e9ac446 Compare June 3, 2025 13:20
@fahedouch fahedouch requested a review from AkihiroSuda June 3, 2025 13:23
// Remove the container
helpers.Ensure("rm", "-f", containerID)

time.Sleep(1 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if this is necessary. Is cni teardown going to be asynchronous?

Copy link
Member

@AkihiroSuda AkihiroSuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@AkihiroSuda AkihiroSuda merged commit 32077c5 into containerd:main Jun 5, 2025
57 of 58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants