KEP-5328: Node Capabilities #5347

pravk03 · 2025-05-28T00:45:56Z

One-line PR description: Add the initial KEP for KEP 5328: Node Capabilities

Issue link: Node Capabilities #5328

Other comments:

k8s-ci-robot · 2025-05-28T00:46:05Z

Welcome @pravk03!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-05-28T00:46:06Z

Hi @pravk03. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-05-28T00:46:07Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: pravk03
Once this PR has been reviewed and has the lgtm label, please assign dchen1107 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-node/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wojtek-t

@dom4ha @sanposhiho @macsko - FYI

wojtek-t · 2025-05-28T11:28:00Z

keps/sig-node/5328-node-capabilities/README.md

+### API Changes
+
+ \
+Add a field `NodeCapabilities` field as type `map[string]string `to the` NodeSpec.NodeStatus` structure. 


Should this really be string=>string?
To make that useful, it needs to carry the semantic and be understood in exactly the same way by scheduler.

I don't have a counterproposal for it - but maybe we can somehow couple that with certain pod features?

The string=>string format was chosen for its flexibility, similar to node labels and annotations. This allows us to represent diverse information types and easily add new capabilities without requiring API schema changes.

I agree that clear semantic understanding by the scheduler is necessary. This could be achieved through consistent, DNS-style keys where the naming convention itself (e.g. kubernetes.io/feature/featureName) effectively defines each capability ?

wojtek-t · 2025-05-28T11:30:48Z

keps/sig-node/5328-node-capabilities/README.md

+*   Requires modifying a core Kubernetes API object, leading to complexities in versioning, upgrades, and maintenance. Extending the types of capabilities might require further core API changes. 
+*   Would make the `NodeStatus` object larger and less focused on just the operational status. 
+*   Scalability 
+    *   Updating `NodeCapabilities` with every `NodeStatus` update is a waste of network resources and API server processing because the information in NodeCapabilities doesn't change frequently. A large NodeCapabilities field, especially with many features or resources, significantly increases the size of the NodeStatus object. 


How many capabilities do we expect here?

Once we decide to publish capability X in the status - are we effectively committing to publishing it forever? Or do we plan to no longer public it after X happens (whether X is an event or time-based)? If we never trim the capability - we're effectively risking growing the object forever

Some capabilities, like those reflecting stable kernel or runtime features are expected to be long-lived and would persist as long as they remain relevant on the node. For capabilities tied to Kubelet features in alpha or beta stages, they can be be automatically deprecated after the feature becomes GA.

Its challenging to estimate the number of capabilities, the proposal's restriction to publishing only information actionable by control plane component (scheduler or admission controllers) is designed to inherently keep this number manageable.

wojtek-t · 2025-05-28T11:31:16Z

keps/sig-node/5328-node-capabilities/kep.yaml

+  - "@pravk03"
+owning-sig: sig-node
+participating-sigs:
+  - sig-node


You should put sig-scheduling here and have a reviewer from that sig.

dom4ha · 2025-05-28T13:24:35Z

keps/sig-node/5328-node-capabilities/README.md

+
+This proposal includes **"Node Capabilities"** as scheduling mechanism in Kubernetes ensuring pods can run on nodes reliably on Nodes where they are scheduled while reducing the operational burden. It provides a standardized way for Kubelet to advertise specific node features and configurations, decreasing reliance on manual taints and labels for scheduling decisions.
+
+NodeCapabilities aims to prevent pods from being scheduled on incompatible nodes - those missing necessary features because of version skew between control plane and the Node or unsupported runtime/kernel configurations ([slack discussion](https://kubernetes.slack.com/archives/C5P3FE08M/p1741867194258139)). Making the scheduler aware of specific node capabilities will enable more reliable pod placement and ensure that incompatibilities are proactively identified as scheduling failures.


Shall it prevent pods from being scheduled or bound? We plan to separate workload scheduling from binding and scheduling may start considering non-existing pods yet, and so, not ready Nodes as well.

/cc @wojtek-t @x13n WDYT?

This isn't dynamic - so we don't expect this to change anytime soon. So it should prevent scheduling.

IIUC some capabilities are, the ones that signify presence of necessary deamons. Newly turnup nodes would always start with such "missing" capabilities, but it would be temporary which should at least block binding.

To avoid any race conditions, the key requirement for NodeCapabilities is to include static configurations available during kubelet bootstrap. We should not really have temporarily missing capabilities.

dom4ha · 2025-05-28T13:33:21Z

keps/sig-node/5328-node-capabilities/README.md

+### Non-Goals
+
+1. Replace taints/tolerations or node labels to aid with the scheduling decisions.
+2. This KEP focuses on introducing the NodeCapabilities API. The exact details of how specific Node Capabilities should be mapped to workload requirements is use case specific and out of scope for this KEP. 


How the scheduling logic could be constructed if the specification which pod needs which capabilities can be designed? I think without it, the proposal is incomplete.

I added some details on the kube-scheduler changes necessary and included example workflows in the Design section. PTAL.

dom4ha · 2025-05-28T13:34:19Z

keps/sig-node/5328-node-capabilities/README.md

+### Goals
+
+1. Define a standard mechanism for Nodes to expose Kubelet, Runtime, and Kernel configurations that are pertinent to workload scheduling and/or improve API Request validation. 
+2. Enhance the kube-scheduler to understand pod requirements and match them against Node capabilities and place pods on compatible nodes. 


Can you give some examples in User stories how pod requirement may map to specific capabilities?

Thanks for the suggestion. Added some examples.

pravk03 · 2025-05-29T00:25:30Z

/cc @tallclair @yujuhong

sanposhiho · 2025-05-29T20:59:40Z

/sig scheduling

haircommander · 2025-05-30T18:15:16Z

keps/sig-node/5328-node-capabilities/README.md

+
+`NodeCapabilityFilter` plugin would 
+
+*   Inspect the PodSpec to infer a set of required NodeCapability key-value pairs.


this inference is confusing to me. How will the scheduler know which features in a pod map to one on a node? is it going to be declarative within kubernetes, or imperative?

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory labels May 28, 2025

k8s-ci-robot requested review from dchen1107 and derekwaynecarr May 28, 2025 00:46

k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 28, 2025

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 28, 2025

pravk03 marked this pull request as draft May 28, 2025 00:47

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025

pravk03 force-pushed the node-capabilities branch 2 times, most recently from 59e7e54 to 4719180 Compare May 28, 2025 00:59

wojtek-t reviewed May 28, 2025

View reviewed changes

dom4ha reviewed May 28, 2025

View reviewed changes

pravk03 force-pushed the node-capabilities branch 3 times, most recently from 4c11e06 to 9254f9b Compare May 28, 2025 23:11

pravk03 changed the title ~~KEP-5328: Node Capability Aware Scheduling~~ KEP-5328: Node Capabilities May 28, 2025

pravk03 marked this pull request as ready for review May 28, 2025 23:14

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 28, 2025

k8s-ci-robot requested a review from mrunalp May 28, 2025 23:14

k8s-ci-robot requested review from tallclair and yujuhong May 29, 2025 00:25

pravk03 force-pushed the node-capabilities branch from 9254f9b to f8291a4 Compare May 29, 2025 01:06

k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label May 29, 2025

github-project-automation bot added this to SIG Scheduling May 29, 2025

github-project-automation bot moved this to Needs Triage in SIG Scheduling May 29, 2025

pravk03 force-pushed the node-capabilities branch 2 times, most recently from a1bd12b to 26e03c8 Compare May 30, 2025 00:44

KEP-5328: Introduce Node Capabilities KEP

9dc58f7

pravk03 force-pushed the node-capabilities branch from 26e03c8 to 9dc58f7 Compare May 30, 2025 00:55

haircommander reviewed May 30, 2025

View reviewed changes


		This proposal includes "Node Capabilities" as scheduling mechanism in Kubernetes ensuring pods can run on nodes reliably on Nodes where they are scheduled while reducing the operational burden. It provides a standardized way for Kubelet to advertise specific node features and configurations, decreasing reliance on manual taints and labels for scheduling decisions.

		NodeCapabilities aims to prevent pods from being scheduled on incompatible nodes - those missing necessary features because of version skew between control plane and the Node or unsupported runtime/kernel configurations ([slack discussion](https://kubernetes.slack.com/archives/C5P3FE08M/p1741867194258139)). Making the scheduler aware of specific node capabilities will enable more reliable pod placement and ensure that incompatibilities are proactively identified as scheduling failures.


		`NodeCapabilityFilter` plugin would

		* Inspect the PodSpec to infer a set of required NodeCapability key-value pairs.

KEP-5328: Node Capabilities #5347

Are you sure you want to change the base?

KEP-5328: Node Capabilities #5347

Uh oh!

Conversation

pravk03 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented May 28, 2025

Uh oh!

k8s-ci-robot commented May 28, 2025

Uh oh!

k8s-ci-robot commented May 28, 2025

Uh oh!

wojtek-t left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pravk03 commented May 29, 2025

Uh oh!

sanposhiho commented May 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pravk03 commented May 28, 2025 •

edited

Loading