Skip to content

Create no-new-privs support proposal doc. #180

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions contributors/design-proposals/no-new-privs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#Support "no new privileges" in Kubernetes

##Description

In Linux, the `execve` system call can grant more privileges to a newly-created process than its parent process. Considering security issues, since Linux kernel v3.5, there is a new flag named `no_new_privs` added to prevent those new privileges from being granted to the processes.

`no_new_privs` is inherited across `fork`, `clone` and `execve` and can not be unset. With `no_new_privs` set, `execve` promises not to grant the privilege to do anything that could not have been done without the `execve` call.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me add the link then.


For more details about `no_new_privs`, please check the Linux kernel document [here](https://www.kernel.org/doc/Documentation/prctl/no_new_privs.txt).

Docker started to support `no_new_privs` option since 1.11. Here is the [link](https://github.com/docker/docker/issues/20329) of the ticket in Docker community to support `no_new_privs` option.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also mention that the OCI runtime spec had support for it as of this pr.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also might mention its on by default in docker

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jessfraz It isn't on by default AFAIK. One needs to pass --security-opt=no-new-privileges to enable it. Did this change recently?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crap I guess we didn't do that, such sad

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw there were issues with SELINUX, idk if these have since been fixed in fedora, but maybe its best to not default to on moby/moby#23981 (comment)


We want to support the creation of containers with `no_new_privs` enabled in Kubernetes, which will make the Kubernetes cluster more safe. Here is the [link](https://github.com/kubernetes/kubernetes/issues/38417) of the ticket in Kubernetes community to track this proposal.


##Current implementation

###Support in Docker

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@euank @yifan-gu - Does rkt have support for this?

Copy link
Contributor

@euank euank Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rkt has an isolator for it, but does not expose it as a CLI flag at this time.

It's a kinda low-level knob and asking the user to tweak it isn't great.

To change it in rkt, an image you package can indicate whether it should be run this way or not (e.g. when I package nginx I might assert it can run with no-new-privs and rkt will apply it, but when I package an application which uses suid ping I wouldn't set the isolator and rkt wouldn't apply it).

It wouldn't be too difficult to expose it on the rkt cli if we needed to (though I don't think we see much value in that) or for rktlet to tweak that lower level knob.

cc @lucab

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timstclair, is that ok to only support docker runtime for now?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runC does have a knob for this in the runtime-spec, so it would only be rkt that doesn't support this for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, for now, I suggest we can hold on for rkt implementation and just implement the support for other runtimes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless rkt supports no_new_privs since May 2016. So, we can make whatever change are necessary to support this Kubernetes API. I filed a tracking issue against rkt to ensure it is exposed in a way that it can be consumed by cri-rkt. rkt/rkt#3491

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this feature to be turned on by default and so I don't see a need for exposing a CLI knob.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear from @jessfraz above whether it can realistically be turned on by default, and even if so, it probably needs some way to disable if, for example, you want to ship ping.

Is this equivalent to mounting NOSUID ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is, however, a VERY low-level mechanism-oriented flag. Is there a way we can re-express this with clearer high-level intent instead?


Since Docker 1.11, user can specify `--security-opt` to enable `no_new_privs` while creating containers, e.g. `docker run --security-opt=no-new-privileges busybox`

For program client, Docker provides an object named `ContainerCreateConfig` defined in package `github.com/docker/engine-api/types` to config container creation parameters. In this object, there is a string array `HostConfig.SecurityOpt` to specify the security options. Client can utilize this field to specify the arguments for security options while creating new containers.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a very short section about runC / OCI

###Support in OCI runtimes

Since version 0.3.0 of the OCI runtime specification, a user can specify the `noNewPrivs` boolean flag in the configuration file.

###Support in OCI runtimes

Since version 0.3.0 of the OCI runtime specification, a user can specify the `noNewPrivs` boolean flag in the configuration file.

More details of OCI implementation can be checked [here](https://github.com/opencontainers/runtime-spec/pull/290).

###SecurityContext in Kubernetes

Kubernetes defines `SecurityContext` for `Container` and `PodSecurityContext` for `PodSpec`. `SecurityContext` objects define the related security options for Kubernetes containers, e.g. selinux options.

While creating a container, kubelet parses the security context object and formats the security option strings for Docker. The security options strings will finally be inserted into `ContainerCreateConfig.HostConfig.SecurityOpt` and passed to Docker. Different Kubernetes runtimes now are using different methods to parse and format the security option strings:
* method `#getSecurityOpts` in `docker_mager_xxxx.go` for Docker runtime
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/mager/manager

* method `#getContainerSecurityOpts` in `docker_container.go` for CRI


##Proposal to support "no new privileges"

To support "no new privileges" options in Kubernetes, it is proposed to make the following changes:

###Changes of SecurityContext objects

Add a new bool type field named `noNewPrivileges` to both `SecurityContext` definition and `PodSecurityContext` definition:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does noNewPrivileges have any meaning for a privileged container? If so, please explain it in the document. If not, please explain how it will be handled (i.e. do we reject it as an invalid config?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@timstclair It should be disabled when privileged is enabled. Agree that behavior should be clarified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use negative names, and I agree with Tim that this is low level. I'd suggest allowPrivilegeEscalation or something that uses a word to capture the appropriate level of risk. We should not default this in the API, but at the kubelet, and use podsecuritypolicy to set appropriate defaults (which is how we treat these other flags).

* `noNewPrivileges=true` in `PodSecurityContext` means that all the containers in the pod should be run with `no-new-privileges` enabled. This should be a pod level control of `no-new-privileges` flag.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is defaulting to true so you can maintain backwards compatibility? Any thoughts on defaulting to the secure setting (I guess that would mean the var needs to be something like newPrivs so when it is left off of a pod spec you're not getting any extra permissions? This is how things like hostPID and hostNetwork work.

* `noNewPrivileges` in `SecurityContext` is a container level control of `no-new-privileges` flag, and can override the pod level `noNewPrivileges` setting.

By default, `noNewPrivileges` is `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the drawback if we set no_new_privs by default for all non privileged containers? Would it break any existing apps? Do we need a knob for sure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't think it would break many turning it on by default for non privileged containers, +1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would break anything with a suid bit (e.g. ping). Unfortunately this probably breaks enough containers that we can't enable it by default, but we can document that it's a best practice to set it to false in the PodSecurityPolicy.


The change of security context API objects requires the update of corresponding Kubernetes documents, need to submit another PR to track this.

###Changes of docker runtime

When parsing the new `SecurityContext` object, kubelet has to take care of `noNewPrivileges` field from security context objects. Once `noNewPrivileges` is `true`, kubelet needs to change `#getSecurityOpts` method in `docker_manager_xxx.go` to add `no-new-privileges` option to `ContainerCreateConfig.HostConfig.SecurityOpt`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this logic may be getting spread around a bit. I would suggest it be added to the security context provider: https://github.com/kubernetes/kubernetes/blob/bcc783c594652df29c8b7d45ae75f26ab7b5b65a/pkg/securitycontext/provider.go#L54

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are in the middle of migrating to the CRI implementation in 1.6, if everything goes well (fingers crossed). In the meantime, I would like to freeze development in the old implementation (pkg/kubelet/dockertools) if possible. I would suggest removing all sections referring the old implementation from this proposal.

p.s. I recently moved all docker-specific code from pkg/securitycontext into pkg/kubelet/dockertools/securitycontext, so the link @pweil- gave above is outdated.


###Changes of CRI runtime

When parsing the new `SecurityContext` object, kubelet has to take care of `noNewPrivileges` field from security context objects. Once `noNewPrivileges` is `true`, kubelet needs to change `#getContainerSecurityOpts` method in `docker_container.go` to add `no-new-privileges` option to `ContainerCreateConfig.HostConfig.SecurityOpt`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I prefer a higher-level description since the code (e.g., the function/file names) changes frequently.

I think you need to add a new field, e.g., AllowPrivilegeEscalation in the CRI LinuxSandboxSecurityContext. Without the API change, kubelet can't tell the runtime to do this.


###Changes of kubectl

This is an additional proposal for kubectl. To improve kubectl user experience, we can add a new flag for kubectl command named `--security-opt`. This flag allows user to create pod with security options configured when using `kubectl run` command. For example, if user issues command like `kubectl run busybox --image=busybox --security-opt=no-new-privileges -- top`, kubernetes shall create a pod with `noNewPrivileges` enabled.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any more description of what this flag might be extended to accept? Or do you just want to define --security-opt=no-new-privileges for now and we can expand it later?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar, for now, I'm thinking just define --security-opt=no-new-privileges, but we can expand it with other options support. I've discussed this with @timstclair on this part before, maybe we can start another ticket to resolve the kubectl enhancement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kubectl run is intentionally simple. I don't see a need for exposing this kind of a knob via kubectl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 @vishh its not like that flag already exists to support apparmor and seccomp, adding it would only cause confusion and lead people to believe you can pass the same args as the docker cli tbh

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I always hated how we named that flag because it didn't scale well ;)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Let's remove this section from this proposal. If you want to be able to set security options via kubectl run, you can open a separate proposal for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with vish, this is not an action we would expose through kubectl. It's probably a candidate for "kubectl set security-context" or something similar though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Please do not expose this directly in kubectl run.


If the proposal of kubectl changes is accepted, the patch can also be submitted as a separate PR.