|
| 1 | +--- |
| 2 | +kep-number: 0 |
| 3 | +title: Bounding Self-Labeling Kubelets |
| 4 | +authors: |
| 5 | + - "@mikedanese" |
| 6 | + - "@liggitt" |
| 7 | +owning-sig: sig-auth |
| 8 | +participating-sigs: |
| 9 | + - sig-node |
| 10 | + - sig-storage |
| 11 | +reviewers: |
| 12 | + - "@saad-ali" |
| 13 | + - "@tallclair" |
| 14 | +approvers: |
| 15 | + - "@thockin" |
| 16 | + - "@smarterclayton" |
| 17 | +creation-date: 2017-08-14 |
| 18 | +last-updated: 2018-10-31 |
| 19 | +status: implementable |
| 20 | +--- |
| 21 | + |
| 22 | +# Bounding Self-Labeling Kubelets |
| 23 | + |
| 24 | +## Motivation |
| 25 | + |
| 26 | +Today the node client has total authority over its own Node labels. |
| 27 | +This ability is incredibly useful for the node auto-registration flow. |
| 28 | +The kubelet reports a set of well-known labels, as well as additional |
| 29 | +labels specified on the command line with `--node-labels`. |
| 30 | + |
| 31 | +While this distributed method of registration is convenient and expedient, it |
| 32 | +has two problems that a centralized approach would not have. Minorly, it makes |
| 33 | +management difficult. Instead of configuring labels in a centralized |
| 34 | +place, we must configure `N` kubelet command lines. More significantly, the |
| 35 | +approach greatly compromises security. Below are two straightforward escalations |
| 36 | +on an initially compromised node that exhibit the attack vector. |
| 37 | + |
| 38 | +### Capturing Dedicated Workloads |
| 39 | + |
| 40 | +Suppose company `foo` needs to run an application that deals with PII on |
| 41 | +dedicated nodes to comply with government regulation. A common mechanism for |
| 42 | +implementing dedicated nodes in Kubernetes today is to set a label or taint |
| 43 | +(e.g. `foo/dedicated=customer-info-app`) on the node and to select these |
| 44 | +dedicated nodes in the workload controller running `customer-info-app`. |
| 45 | + |
| 46 | +Since the nodes self reports labels upon registration, an intruder can easily |
| 47 | +register a compromised node with label `foo/dedicated=customer-info-app`. The |
| 48 | +scheduler will then bind `customer-info-app` to the compromised node potentially |
| 49 | +giving the intruder easy access to the PII. |
| 50 | + |
| 51 | +This attack also extends to secrets. Suppose company `foo` runs their outward |
| 52 | +facing nginx on dedicated nodes to reduce exposure to the company's publicly |
| 53 | +trusted server certificates. They use the secret mechanism to distribute the |
| 54 | +serving certificate key. An intruder captures the dedicated nginx workload in |
| 55 | +the same way and can now use the node certificate to read the company's serving |
| 56 | +certificate key. |
| 57 | + |
| 58 | +## Proposal |
| 59 | + |
| 60 | +1. Modify the `NodeRestriction` admission plugin to prevent Kubelets from self-setting labels |
| 61 | +within the `k8s.io` and `kubernetes.io` namespaces, *except for these specifically allowed labels/prefixes*: |
| 62 | + |
| 63 | + ``` |
| 64 | + kubernetes.io/hostname |
| 65 | + failure-domain.[beta.]kubernetes.io/zone |
| 66 | + failure-domain.[beta.]kubernetes.io/region |
| 67 | + [beta.]kubernetes.io/instance-type |
| 68 | + [beta.]kubernetes.io/os |
| 69 | + [beta.]kubernetes.io/arch |
| 70 | + [*.]kubelet.kubernetes.io/* |
| 71 | + [*.]node.kubernetes.io/* |
| 72 | + ``` |
| 73 | + |
| 74 | +2. Reserve/recommend the `node-restriction.kubernetes.io/*` label prefix for users |
| 75 | +that want to label `Node` objects centrally for isolation purposes. |
| 76 | + |
| 77 | + > The `node-restriction.kubernetes.io/*` label namespace is reserved for cluster deployers |
| 78 | + > to isolate nodes. These labels cannot be self-set by kubelets when the `NodeRestriction` |
| 79 | + > admission plugin is enabled. |
| 80 | + |
| 81 | +This accomplishes the following goals: |
| 82 | + |
| 83 | +- continues allowing people to use arbitrary labels under their own namespaces any way they wish |
| 84 | +- supports legacy labels kubelets are already adding |
| 85 | +- provides a place under the `kubernetes.io` label namespace for node isolation labeling |
| 86 | +- provide a place under the `kubernetes.io` label namespace for kubelets to self-label with kubelet and node-specific labels |
| 87 | + |
| 88 | +## Implementation Timeline |
| 89 | + |
| 90 | +v1.13: |
| 91 | + |
| 92 | +* Kubelet deprecates setting `kubernetes.io` or `k8s.io` labels via `--node-labels`, |
| 93 | +other than the specifically allowed labels/prefixes described above, |
| 94 | +and warns when invoked with `kubernetes.io` or `k8s.io` labels outside that set. |
| 95 | +* NodeRestriction admission prevents kubelets from setting `kubernetes.io` or `k8s.io` |
| 96 | +labels other than the specifically allowed labels/prefixes described above on Node *update* (not on Node create) |
| 97 | + |
| 98 | +v1.15: |
| 99 | + |
| 100 | +* Kubelet removes the ability to set `kubernetes.io` or `k8s.io` labels via `--node-labels` |
| 101 | +other than the specifically allowed labels/prefixes described above (deprecation period |
| 102 | +of 6 months for CLI elements of admin-facing components is complete) |
| 103 | + |
| 104 | +v1.17: |
| 105 | + |
| 106 | +* NodeRestriction admission prevents kubelets from setting `kubernetes.io` or `k8s.io` labels |
| 107 | +other than the specifically allowed labels/prefixes described above on Node create as well |
| 108 | +(oldest supported kubelet running against a v1.17 apiserver is v1.15) |
| 109 | + |
| 110 | +## Alternatives Considered |
| 111 | + |
| 112 | +### File or flag-based configuration of the apiserver to allow specifying allowed labels |
| 113 | + |
| 114 | +* A fixed set of labels and label prefixes is simpler to reason about, and makes every cluster behave consistently |
| 115 | +* File-based config isn't easily inspectable to be able to verify enforced labels |
| 116 | +* File-based config isn't easily kept in sync in HA apiserver setups |
| 117 | + |
| 118 | +### API-based configuration of the apiserver to allow specifying allowed labels |
| 119 | + |
| 120 | +* A fixed set of labels and label prefixes is simpler to reason about, and makes every cluster behave consistently |
| 121 | +* An API object that controls the allowed labels is a potential escalation path for a compromised node |
| 122 | + |
| 123 | +### Allow kubelets to add any labels they wish, and add NoSchedule taints if disallowed labels are added |
| 124 | + |
| 125 | +* To be robust, this approach would also likely involve a controller to automatically inspect labels and remove the NoSchedule taint. This seemed overly complex. Additionally, it was difficult to come up with a tainting scheme that preserved information about which labels were the cause. |
| 126 | + |
| 127 | +### Forbid all labels regardless of namespace except for a specifically allowed set |
| 128 | + |
| 129 | +* This was much more disruptive to existing usage of `--node-labels`. |
| 130 | +* This was much more difficult to integrate with other systems allowing arbitrary topology labels like CSI. |
| 131 | +* This placed restrictions on how labels outside the `kubernetes.io` and `k8s.io` label namespaces could be used, which didn't seem proper. |
0 commit comments