|
| 1 | +# Limiting Node Scope on the Node object |
| 2 | + |
| 3 | +### Author: Mike Danese, (@mikedanese) |
| 4 | + |
| 5 | +## Background |
| 6 | + |
| 7 | +Today the node client has total authority over its own Node object. This ability |
| 8 | +is incredibly useful for the node auto-registration flow. Some examples of |
| 9 | +fields the kubelet self-reports in the early node object are: |
| 10 | + |
| 11 | +1. Labels (provided by kubelet commandline) |
| 12 | +1. Taints (provided by kubelet commandline) |
| 13 | +1. Addresses (provided by kubelet commandline and detected from the environment) |
| 14 | + |
| 15 | +As well as others. |
| 16 | + |
| 17 | +## Problem |
| 18 | + |
| 19 | +While this distributed method of registration is convenient and expedient, it |
| 20 | +has two problems that a centralized approach would not have. Minorly, it makes |
| 21 | +management difficult. Instead of configuring labels and taints in a centralized |
| 22 | +place, we must configure `N` kubelet command lines. More significantly, the |
| 23 | +approach greatly compromises security. Below are two straightforward escalations |
| 24 | +on an initially compromised node that exhibit the attack vector. |
| 25 | + |
| 26 | +### Capturing Dedicated Workloads |
| 27 | + |
| 28 | +Suppose company `foo` needs to run an application that deals with PII on |
| 29 | +dedicated nodes to comply with government regulation. A common mechanism for |
| 30 | +implementing dedicated nodes in Kubernetes today is to set a label or taint |
| 31 | +(e.g. `foo/dedicated=customer-info-app`) on the node and to select these |
| 32 | +dedicated nodes in the workload controller running `customer-info-app`. |
| 33 | + |
| 34 | +Since the nodes self reports labels upon registration, an intruder can easily |
| 35 | +register a compromised node with label `foo/dedicated=customer-info-app`. The |
| 36 | +scheduler will then bind `customer-info-app` to the compromised node potentially |
| 37 | +giving the intruder easy access to the PII. |
| 38 | + |
| 39 | +This attack also extends to secrets. Suppose company `foo` runs their outward |
| 40 | +facing nginx on dedicated nodes to reduce exposure to the company's publicly |
| 41 | +trusted server certificates. They use the secret mechanism to distribute the |
| 42 | +serving certificate key. An intruder captures the dedicated nginx workload in |
| 43 | +the same way and can now use the node certificate to read the company's serving |
| 44 | +certificate key. |
| 45 | + |
| 46 | +### Gaining Access to Arbitrary Serving Certificates |
| 47 | + |
| 48 | +Suppose company `foo` uses TLS for server authentication between internal |
| 49 | +microservices. The company uses the Kubernetes certificates API to provision |
| 50 | +these workload certificates for workload `bar` and trust is rooted to the |
| 51 | +cluster's root certificate authority. |
| 52 | + |
| 53 | +When [kubelet server certificate |
| 54 | +rotation](https://github.com/kubernetes/features/issues/267) is complete, the |
| 55 | +same API will be used to provision serving certificates for kubelets. The design |
| 56 | +expects to cross-reference the addresses reported in the NodeStatus with the |
| 57 | +subject alternative names in the certificate signing request to validate the |
| 58 | +certificate signing request. |
| 59 | + |
| 60 | +An intruder can easily register a node with a NodeAddress `bar` and use this |
| 61 | +certificate to MITM all traffic to service `bar` the flows through kube-proxy on |
| 62 | +that node. |
| 63 | + |
| 64 | +## Proposed Solution |
| 65 | + |
| 66 | +In many environments, we can improve the situation by centralizing reporting of |
| 67 | +these node attributes to a more trusted source and disallowing reporting of |
| 68 | +these attributes from the kubelet. |
| 69 | + |
| 70 | +We can scope down the initial Node object creation by moving to a centralized |
| 71 | +controller model. In many deployment environments, the sensitive attributes of a |
| 72 | +Node object discussed above ("labels", "taints", "addresses") are discoverable |
| 73 | +by consulting a machine database (e.g. the GCE API). Using the |
| 74 | +[initializer](admission_control_extension.md) mechanism, a centralized |
| 75 | +controller can register an initializer for the node object and build the |
| 76 | +sensitive fields by consulting the machine database. The |
| 77 | +`cloud-controller-manager` is an obvious candidate to house such a controller. |
| 78 | + |
| 79 | +We can scope down subsequent updates of the Node object by moving control of |
| 80 | +sensitive fields into the NodeRestriction admission controller. For backwards |
| 81 | +compatibility we can begin by zero'ing updates to sensitive fields and after the |
| 82 | +kubelet compatibility window has expired, begin to return 403s. This also has |
| 83 | +the nice property of giving us fine grained control over field/values of |
| 84 | +updates. For example, in this model we could easily allow a kubelet to update |
| 85 | +it's `OutOfMemory` taint and disallow the kubelet from updating its `dedicated` |
| 86 | +taint. |
| 87 | + |
| 88 | +In this design we assume that the infrastructure has already validated caller ID |
| 89 | +(e.g. through the process of TLS bootstrap) and that we can trust that the node |
| 90 | +client is who they say they are. |
0 commit comments