Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config-linux: Deprecate device access denial #1214

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Werkov
Copy link

@Werkov Werkov commented Jul 12, 2023

Separate allow/deny lists are specific to device controller existing only in cgroup v1. Current semantics for devices that don't match neither allow nor deny is confusing.

cgroup v2 implements access control on the default hierarchy with BPF hooks. Follow the approach of systemd (refer to systemd.resource(5)) with DevicePolicy=strict, i.e. consider all devices denied by default and add only entries for devices that should be allowed.

This will simplify the job for runtimes that use systemd for container cgroup configuration.

For starters, mention that "allow" entries that don't stick to the this approach are deprecated. Next step would be removal of the "allow" attribute and implicit denial on all devices.

@Werkov
Copy link
Author

Werkov commented Jan 5, 2024

Bump?

@utam0k
Copy link
Member

utam0k commented Jan 6, 2024

In cgroup v2, runc and other major OCI runtimes has implemented not only systemd but also its original cgroup v2 driver. How about this behavior? As far as I know, it attempts to emulate the cgroup v1's behavior.

Separate allow/deny lists are specific to device controller existing
only in cgroup v1. Current semantics for devices that don't match
neither allow nor deny is confusing.

cgroup v2 implements access control on the default hierarchy with BPF
hooks. Follow the approach of systemd (refer to systemd.resource(5))
with DevicePolicy=strict, i.e. consider all devices denied by default
and add only entries for devices that should be allowed.

This will simplify the job for runtimes that use systemd for container
cgroup configuration.

For starters, mention that "allow" entries that don't stick to this
approach are deprecated. Next step would be removal of the "allow"
attribute and implicit denial on all devices.

Signed-off-by: Michal Koutný <[email protected]>
@Werkov Werkov force-pushed the deprecate-device-deny branch from bb837ae to 95da17d Compare March 5, 2025 15:55
@Werkov
Copy link
Author

Werkov commented Mar 5, 2025

but also its original cgroup v2 driver.

What does this refer to? (cgroupfs driver? Does it mean it container runtimes synthesize BPF progs that support both allow/deny list like the v1 device controller)

@utam0k
Copy link
Member

utam0k commented Mar 9, 2025

cgroupfs driver?

yes

@giuseppe
Copy link
Member

this looks like a breaking change, when will we be able to remove it?

I am not sure this is the right thing to do anyway. The devices cgroup allows to be configured this way (and with eBPF there is even more flexibility in doing it), so no need to break the cgroupfs driver only because systemd puts more limitations on the way a cgroup can be configured.

@Werkov
Copy link
Author

Werkov commented Mar 10, 2025

cgroupfs driver could be left untouched since as you write, there is versatility of eBPF.
The issue was that runtimes could not express this v1 semantics with systemd driver, so even if they were supposed to use purely systemd API, they attached extra eBPF progs to achieve the v1 behavior. (And IIRC, those could be lost upon systemd's daemon-reload.)

@giuseppe
Copy link
Member

cgroupfs driver could be left untouched since as you write, there is versatility of eBPF.

so in this case we won't need to deprecate the current setting, right?

We don't mention systemd anywhere in the cgroups configuration at the moment, so I don't think we can merge this change as it is now

@Werkov
Copy link
Author

Werkov commented Mar 10, 2025

But how could runtimes implement this (deny lists) on top of systemd driver?
That's what I thought would qualify it for deprecation.

@giuseppe
Copy link
Member

they can still implement it using cgroupfs. We can document it, but I don't see why we should block this possibility just because systemd doesn't allow it. Anyway this is my personal preference, not sure if other maintainers agree

@Werkov
Copy link
Author

Werkov commented Mar 11, 2025

I'm coming from the area where cgroupfs driver cannot be used on systemd-managed userspace. I have to check what's the recent progress, since it is possible to use cgroupfs there as well if runtime implements delegation properly. Then this deprecation proposal would be deprecated itself :-)

One independent argument for deprecating denylists though, is that they're less secure than explicit allowlists (and deny by default).

@giuseppe
Copy link
Member

that is where we disagree :-) There are already enough opinionated tools. The OCI runtime spec should allow what is possible to do and be as close as possible to the kernel features enabling that. It should not try to educate users, it is unlikely an end-user uses directly the OCI runtime anyway.

@kolyshkin
Copy link
Contributor

The issue was that runtimes could not express this v1 semantics with systemd driver,

They actually can, using BPFProgram=device:<program-path> (since systemd v249), although we do not use it in runc (yet?).

@kolyshkin
Copy link
Contributor

@cyphar WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants