Skip to content

Commit 9ca6fd9

Browse files
authored
Merge pull request containerd#9117 from kinvolk/rata/userns-chown-opt-in
Require opt-in for rootfs chown when idmap mounts is not supported
2 parents 719fa3d + 8e3722c commit 9ca6fd9

File tree

5 files changed

+79
-10
lines changed

5 files changed

+79
-10
lines changed

docs/user-namespaces/README.md

+35-7
Original file line numberDiff line numberDiff line change
@@ -86,22 +86,22 @@ Different containerd versions have different limitations too, those are highligh
8686
### containerd 1.7
8787

8888
One limitation present in containerd 1.7 is that it needs to change the ownership of every file and
89-
directory inside the container image, during Pod startup. This means it has a storage overhead (the
90-
size of the container image is duplicated each time a pod is created) and can significantly impact
91-
the container startup latency.
89+
directory inside the container image, during Pod startup. This means it has a storage overhead, as
90+
**the size of the container image is duplicated each time a pod is created**, and can significantly
91+
impact the container startup latency, as doing such a copy takes time too.
9292

9393
You can mitigate this limitation by switching `/sys/module/overlay/parameters/metacopy` to `Y`. This
9494
will significantly reduce the storage and performance overhead, as only the inode for each file of
9595
the container image will be duplicated, but not the content of the file. This means it will use less
9696
storage and it will be faster. However, it is not a panacea.
9797

98-
If you change the metacopy param, make sure to do it in a way that is persistant across reboots. You
98+
If you change the metacopy param, make sure to do it in a way that is persistent across reboots. You
9999
should also be aware that this setting will be used for all containers, not just containers with
100100
user namespaces enabled. This will affect all the snapshots that you take manually (if you happen to
101101
do that). In that case, make sure to use the same value of `/sys/module/overlay/parameters/metacopy`
102102
when creating and restoring the snapshot.
103103

104-
### containerd 2.0
104+
### containerd 2.0 and above
105105

106106
The storage and latency limitation from containerd 1.7 are not present in container 2.0 and above,
107107
if you use the overlay snapshotter (this is used by default). It will not use more storage at all,
@@ -111,8 +111,36 @@ This is achieved by using the kernel feature idmap mounts with the container roo
111111
image). This allows an overlay file-system to expose the image with different UID/GID without copying
112112
the files nor the inodes, just using a bind-mount.
113113

114-
You can check if you are using idmap mounts for the container image if you create a pod with user
115-
namespaces, exec into it and run:
114+
Containerd by default will refuse to create a container with user namespaces, if overlayfs is the
115+
snapshotter and the kernel running doesn't support idmap mounts for overlayfs. This is to make sure
116+
before falling back to the expensive chown (in terms of storage and pod startup latency), you
117+
understand the implications and decide to opt-in. Please read the containerd 1.7 limitations for an
118+
explanation of those.
119+
120+
If your kernel doesn't support idmap mounts for the overlayfs snapshotter, you will see an error
121+
like:
122+
123+
```
124+
failed to create containerd container: snapshotter "overlayfs" doesn't support idmap mounts on this host, configure `slow_chown` to allow a slower and expensive fallback
125+
```
126+
127+
Linux supports idmap mounts on an overlayfs since version 5.19.
128+
129+
You can opt-in for the slow chown by adding the `slow_chown` field to your config in the overlayfs
130+
snapshotter section, like this:
131+
132+
```
133+
[plugins."io.containerd.snapshotter.v1.overlayfs"]
134+
slow_chown = true
135+
```
136+
137+
Note that only overlayfs users need to opt-in for the slow chown, as it as it is the only one that
138+
containerd provides a better option (only the overlayfs snapshotter supports idmap mounts in
139+
containerd). If you use another snapshotter, you will fall-back to the expensive chown without the
140+
need to opt-in.
141+
142+
That being said, you can double check if your container is using idmap mounts for the container
143+
image if you create a pod with user namespaces, exec into it and run:
116144

117145
```
118146
mount | grep overlay

script/test/utils.sh

+7
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,13 @@ version=2
5252
5353
[plugins."io.containerd.grpc.v1.cri"]
5454
drain_exec_sync_io_timeout = "10s"
55+
56+
# Userns requires idmap mount support for overlayfs (added in 5.19)
57+
# Let's opt-in for a recursive chown, so we can always test this even in old distros.
58+
# Note that if idmap mounts support is present, we will use that, so it is harmless to keep this
59+
# here.
60+
[plugins."io.containerd.snapshotter.v1.overlayfs"]
61+
slow_chown = true
5562
EOF
5663

5764
if command -v sestatus >/dev/null 2>&1; then

snapshots/overlay/overlay.go

+8
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@ type SnapshotterConfig struct {
4646
ms MetaStore
4747
mountOptions []string
4848
remapIds bool
49+
slowChown bool
4950
}
5051

5152
// Opt is an option to configure the overlay snapshotter
@@ -98,13 +99,19 @@ func WithRemapIds(config *SnapshotterConfig) error {
9899
return nil
99100
}
100101

102+
func WithSlowChown(config *SnapshotterConfig) error {
103+
config.slowChown = true
104+
return nil
105+
}
106+
101107
type snapshotter struct {
102108
root string
103109
ms MetaStore
104110
asyncRemove bool
105111
upperdirLabel bool
106112
options []string
107113
remapIds bool
114+
slowChown bool
108115
}
109116

110117
// NewSnapshotter returns a Snapshotter which uses overlayfs. The overlayfs
@@ -161,6 +168,7 @@ func NewSnapshotter(root string, opts ...Opt) (snapshots.Snapshotter, error) {
161168
upperdirLabel: config.upperdirLabel,
162169
options: config.mountOptions,
163170
remapIds: config.remapIds,
171+
slowChown: config.slowChown,
164172
}, nil
165173
}
166174

snapshots/overlay/plugin/plugin.go

+15-1
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@ import (
2828
)
2929

3030
const (
31-
capaRemapIds = "remap-ids"
31+
capaRemapIds = "remap-ids"
32+
capaOnlyRemapIds = "only-remap-ids"
3233
)
3334

3435
// Config represents configuration for the overlay plugin.
@@ -38,6 +39,11 @@ type Config struct {
3839
UpperdirLabel bool `toml:"upperdir_label"`
3940
SyncRemove bool `toml:"sync_remove"`
4041

42+
// slowChown allows the plugin to fallback to a recursive chown if fast options (like
43+
// idmap mounts) are not available. See more info about the overhead this can have in
44+
// github.com/containerd/containerd/docs/user-namespaces/.
45+
SlowChown bool `toml:"slow_chown"`
46+
4147
// MountOptions are options used for the overlay mount (not used on bind mounts)
4248
MountOptions []string `toml:"mount_options"`
4349
}
@@ -76,6 +82,14 @@ func init() {
7682
ic.Meta.Capabilities = append(ic.Meta.Capabilities, capaRemapIds)
7783
}
7884

85+
if config.SlowChown {
86+
oOpts = append(oOpts, overlay.WithSlowChown)
87+
} else {
88+
// If slowChown is false, we use capaOnlyRemapIds to signal we only
89+
// allow idmap mounts.
90+
ic.Meta.Capabilities = append(ic.Meta.Capabilities, capaOnlyRemapIds)
91+
}
92+
7993
ic.Meta.Exports["root"] = root
8094
return overlay.NewSnapshotter(root, oOpts...)
8195
},

snapshotter_opts_unix.go

+14-2
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@ import (
2626
)
2727

2828
const (
29-
capabRemapIDs = "remap-ids"
29+
capaRemapIDs = "remap-ids"
30+
capaOnlyRemapIds = "only-remap-ids"
3031
)
3132

3233
// WithRemapperLabels creates the labels used by any supporting snapshotter
@@ -45,7 +46,7 @@ func resolveSnapshotOptions(ctx context.Context, client *Client, snapshotterName
4546
}
4647

4748
for _, capab := range capabs {
48-
if capab == capabRemapIDs {
49+
if capab == capaRemapIDs {
4950
// Snapshotter supports ID remapping, we don't need to do anything.
5051
return parent, nil
5152
}
@@ -72,6 +73,17 @@ func resolveSnapshotOptions(ctx context.Context, client *Client, snapshotterName
7273
return parent, nil
7374
}
7475

76+
capaOnlyRemap := false
77+
for _, capa := range capabs {
78+
if capa == capaOnlyRemapIds {
79+
capaOnlyRemap = true
80+
}
81+
}
82+
83+
if capaOnlyRemap {
84+
return "", fmt.Errorf("snapshotter %q doesn't support idmap mounts on this host, configure `slow_chown` to allow a slower and expensive fallback", snapshotterName)
85+
}
86+
7587
var ctrUID, hostUID, length uint32
7688
_, err = fmt.Sscanf(uidMap, "%d:%d:%d", &ctrUID, &hostUID, &length)
7789
if err != nil {

0 commit comments

Comments
 (0)