-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Create no-new-privs support proposal doc. #180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
#Support "no new privileges" in Kubernetes | ||
|
||
##Description | ||
|
||
In Linux, the `execve` system call can grant more privileges to a newly-created process than its parent process. Considering security issues, since Linux kernel v3.5, there is a new flag named `no_new_privs` added to prevent those new privileges from being granted to the processes. | ||
|
||
`no_new_privs` is inherited across `fork`, `clone` and `execve` and can not be unset. With `no_new_privs` set, `execve` promises not to grant the privilege to do anything that could not have been done without the `execve` call. | ||
|
||
For more details about `no_new_privs`, please check the Linux kernel document [here](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt). | ||
|
||
Docker started to support `no_new_privs` option since 1.11. Here is the [link](https://github.com/docker/docker/issues/20329) of the ticket in Docker community to support `no_new_privs` option. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can also mention that the OCI runtime spec had support for it as of this pr. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also might mention its on by default in docker There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jessfraz It isn't on by default AFAIK. One needs to pass --security-opt=no-new-privileges to enable it. Did this change recently? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. crap I guess we didn't do that, such sad There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. fwiw there were issues with SELINUX, idk if these have since been fixed in fedora, but maybe its best to not default to on moby/moby#23981 (comment) |
||
|
||
We want to support the creation of containers with `no_new_privs` enabled in Kubernetes, which will make the Kubernetes cluster more safe. Here is the [link](https://github.com/kubernetes/kubernetes/issues/38417) of the ticket in Kubernetes community to track this proposal. | ||
|
||
|
||
##Current implementation | ||
|
||
###Support in Docker | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rkt has an isolator for it, but does not expose it as a CLI flag at this time. It's a kinda low-level knob and asking the user to tweak it isn't great. To change it in rkt, an image you package can indicate whether it should be run this way or not (e.g. when I package nginx I might assert it can run with no-new-privs and rkt will apply it, but when I package an application which uses suid ping I wouldn't set the isolator and rkt wouldn't apply it). It wouldn't be too difficult to expose it on the rkt cli if we needed to (though I don't think we see much value in that) or for rktlet to tweak that lower level knob. cc @lucab There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @timstclair, is that ok to only support docker runtime for now? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. runC does have a knob for this in the runtime-spec, so it would only be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yes, for now, I suggest we can hold on for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @lucab @jonboulle There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Regardless rkt supports no_new_privs since May 2016. So, we can make whatever change are necessary to support this Kubernetes API. I filed a tracking issue against rkt to ensure it is exposed in a way that it can be consumed by cri-rkt. rkt/rkt#3491 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd expect this feature to be turned on by default and so I don't see a need for exposing a CLI knob. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's unclear from @jessfraz above whether it can realistically be turned on by default, and even if so, it probably needs some way to disable if, for example, you want to ship Is this equivalent to mounting NOSUID ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is, however, a VERY low-level mechanism-oriented flag. Is there a way we can re-express this with clearer high-level intent instead? |
||
|
||
Since Docker 1.11, user can specify `--security-opt` to enable `no_new_privs` while creating containers, e.g. `docker run --security-opt=no-new-privileges busybox` | ||
|
||
For program client, Docker provides an object named `ContainerCreateConfig` defined in package `github.com/docker/engine-api/types` to config container creation parameters. In this object, there is a string array `HostConfig.SecurityOpt` to specify the security options. Client can utilize this field to specify the arguments for security options while creating new containers. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add a very short section about runC / OCI
|
||
###Support in OCI runtimes | ||
|
||
Since version 0.3.0 of the OCI runtime specification, a user can specify the `noNewPrivs` boolean flag in the configuration file. | ||
|
||
More details of OCI implementation can be checked [here](https://github.com/opencontainers/runtime-spec/pull/290). | ||
|
||
###SecurityContext in Kubernetes | ||
|
||
Kubernetes defines `SecurityContext` for `Container` and `PodSecurityContext` for `PodSpec`. `SecurityContext` objects define the related security options for Kubernetes containers, e.g. selinux options. | ||
|
||
While creating a container, kubelet parses the security context object and formats the security option strings for Docker. The security options strings will finally be inserted into `ContainerCreateConfig.HostConfig.SecurityOpt` and passed to Docker. Different Kubernetes runtimes now are using different methods to parse and format the security option strings: | ||
* method `#getSecurityOpts` in `docker_mager_xxxx.go` for Docker runtime | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/mager/manager |
||
* method `#getContainerSecurityOpts` in `docker_container.go` for CRI | ||
|
||
|
||
##Proposal to support "no new privileges" | ||
|
||
To support "no new privileges" options in Kubernetes, it is proposed to make the following changes: | ||
|
||
###Changes of SecurityContext objects | ||
|
||
Add a new bool type field named `noNewPrivileges` to both `SecurityContext` definition and `PodSecurityContext` definition: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @timstclair It should be disabled when privileged is enabled. Agree that behavior should be clarified. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't use negative names, and I agree with Tim that this is low level. I'd suggest |
||
* `noNewPrivileges=true` in `PodSecurityContext` means that all the containers in the pod should be run with `no-new-privileges` enabled. This should be a pod level control of `no-new-privileges` flag. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is defaulting to true so you can maintain backwards compatibility? Any thoughts on defaulting to the secure setting (I guess that would mean the var needs to be something like |
||
* `noNewPrivileges` in `SecurityContext` is a container level control of `no-new-privileges` flag, and can override the pod level `noNewPrivileges` setting. | ||
|
||
By default, `noNewPrivileges` is `false`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would be the drawback if we set There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I honestly don't think it would break many turning it on by default for non privileged containers, +1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would break anything with a suid bit (e.g. ping). Unfortunately this probably breaks enough containers that we can't enable it by default, but we can document that it's a best practice to set it to false in the |
||
|
||
The change of security context API objects requires the update of corresponding Kubernetes documents, need to submit another PR to track this. | ||
|
||
###Changes of docker runtime | ||
|
||
When parsing the new `SecurityContext` object, kubelet has to take care of `noNewPrivileges` field from security context objects. Once `noNewPrivileges` is `true`, kubelet needs to change `#getSecurityOpts` method in `docker_manager_xxx.go` to add `no-new-privileges` option to `ContainerCreateConfig.HostConfig.SecurityOpt` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this logic may be getting spread around a bit. I would suggest it be added to the security context provider: https://github.com/kubernetes/kubernetes/blob/bcc783c594652df29c8b7d45ae75f26ab7b5b65a/pkg/securitycontext/provider.go#L54 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are in the middle of migrating to the CRI implementation in 1.6, if everything goes well (fingers crossed). In the meantime, I would like to freeze development in the old implementation (pkg/kubelet/dockertools) if possible. I would suggest removing all sections referring the old implementation from this proposal. p.s. I recently moved all docker-specific code from |
||
|
||
###Changes of CRI runtime | ||
|
||
When parsing the new `SecurityContext` object, kubelet has to take care of `noNewPrivileges` field from security context objects. Once `noNewPrivileges` is `true`, kubelet needs to change `#getContainerSecurityOpts` method in `docker_container.go` to add `no-new-privileges` option to `ContainerCreateConfig.HostConfig.SecurityOpt` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: I prefer a higher-level description since the code (e.g., the function/file names) changes frequently. I think you need to add a new field, e.g., |
||
|
||
###Changes of kubectl | ||
|
||
This is an additional proposal for kubectl. To improve kubectl user experience, we can add a new flag for kubectl command named `--security-opt`. This flag allows user to create pod with security options configured when using `kubectl run` command. For example, if user issues command like `kubectl run busybox --image=busybox --security-opt=no-new-privileges -- top`, kubernetes shall create a pod with `noNewPrivileges` enabled. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any more description of what this flag might be extended to accept? Or do you just want to define There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @cyphar, for now, I'm thinking just define There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 @vishh its not like that flag already exists to support apparmor and seccomp, adding it would only cause confusion and lead people to believe you can pass the same args as the docker cli tbh There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also I always hated how we named that flag because it didn't scale well ;) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. Let's remove this section from this proposal. If you want to be able to set security options via kubectl run, you can open a separate proposal for that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree with vish, this is not an action we would expose through kubectl. It's probably a candidate for "kubectl set security-context" or something similar though. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1. Please do not expose this directly in |
||
|
||
If the proposal of kubectl changes is accepted, the patch can also be submitted as a separate PR. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that we add a link to https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me add the link then.