Adding Design Principles #596

robscott · 2025-03-28T04:57:40Z

As the project continues to grow, it would be helpful to have some high level design principles for the project. These principles can help guide us when determining which features and work to prioritize.

/cc @ahg-g @smarterclayton @danehans @kfswain @Jeffwan @shaneutt

netlify · 2025-03-28T04:57:58Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`7e469c9`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67e6e24cfd7895000846db8d
😎 Deploy Preview	https://deploy-preview-596--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

shaneutt

Thanks @robscott!

Some comments and feedback for your considerations. 🖖

shaneutt · 2025-03-28T13:00:39Z

site-src/concepts/design-principles.md

+## Focus on the core interfaces
+
+There are two interfaces of note here:
+
+### 1. Gateway -> Endpoint Picker
+At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.
+
+### 2. Endpoint Picker -> Model Server Framework
+This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.
+
+Although we can extend these interfaces in the future, it’s critical to get these right early in the project and stabilize them as soon as possible. We want to be able to give controller and extension developers a stable target to build against.


To me this reads more like a statement of intent do deliver specific existing APIs within a time-frame of "as soon as possible". It doesn't strike me as something that you would document in "Design Principles" but instead something more like a GitHub Milestone?

At the time of writing, there are no open milestones for this project, to signal to the community what's intended to be next. I suggest re-compiling the intent as a Milestone for delivering the endpoint picker API as GA by a specific time, and put a due date on it to signal clearly to the community the specific time-frame you're trying to target.

I would expect "Design Principles" for a project like this to start with some kind of mission statement at the top which is the guiding goal of the project which the underlying design principles then serve. Perhaps:

Suggested change

## Focus on the core interfaces

There are two interfaces of note here:

### 1. Gateway -> Endpoint Picker

At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.

### 2. Endpoint Picker -> Model Server Framework

This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.

Although we can extend these interfaces in the future, it’s critical to get these right early in the project and stabilize them as soon as possible. We want to be able to give controller and extension developers a stable target to build against.

These principles guide our efforts to build flexible [Gateway API] extensions that empower the development of high-performance [AI Inference] routing technologies—balancing rapid delivery with long-term growth.

> **Note**: For simplicity, we'll refer to Gateway API Gateways which are composed together with AI

> Inference extensions as "AI Gateways" throughout this document.

[Gateway]:https://github.com/kubernetes-sigs/gateway-api

[AI Inference]:https://www.arm.com/glossary/ai-inference

LMKWYT?

This is a great intro, thanks! I added this while also reworking the first principle to be more of a "principle" since I do think it's important. Hope this makes sense.

shaneutt · 2025-03-28T13:32:51Z

site-src/concepts/design-principles.md

+## The default out of the box experience should be compelling
+
+We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization.


I like this one, but I think "compelling" can be very open for interpretation, particularly if you're not a native english speaker 🤔

Consider this thought I had:

Suggested change

## The default out of the box experience should be compelling

We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization.

## Our presets are finely tuned

Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.

Updated, thanks! For now I've copied your suggestion verbatim, because I think it's an improvement over what I initially had, but I also want to go a bit broader here. I'm trying to find a way to say two things in this doc:

Extensions are critical for innovation and flexibility

Our default path needs to be great, so we need to continue to invest in our reference extension

This principle was meant to cover 2, and I think I still haven't communicated it quite well enough quite yet, will think about this more and iterate on it. Any additional suggestions would be appreciated.

Extensions are critical for innovation and flexibility

Ok cool, I think you can say this at the top. It's kinda a "guiding principle".

Our default path needs to be great, so we need to continue to invest in our reference extension

Maybe:

Suggested change

## The default out of the box experience should be compelling

We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization.

## Our presets are finely tuned

We provide APIs and reference implementations for the most common inference requirements. Our defaults for those APIs and implementations—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization. If you take all of our default extensions and attach them to a compatible `Gateway`, it just "works out of the box".

Might need some more workshopping, but LMKWYT? 🤔

shaneutt · 2025-03-28T13:58:34Z

site-src/concepts/design-principles.md

+## Encourage innovation via extensibility
+
+This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.


Love it 👍

make it as easy as possible for AI researchers to experiment with custom scheduling...

Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?

shaneutt · 2025-03-28T14:03:21Z

site-src/concepts/design-principles.md

+## Objectives over instructions
+
+The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.


This section, more than any other section, as a "Design Principle" is not resonating with me just yet. To me this reads like it's trying to talk about scope control. Could you please help me to better understand the intent here, by providing a somewhat detailed example of a situation that could occur which would run counter to this principle? I think that would help me to better understand what it's trying to convey 🤔

One example is configuration options for the scheduling algorithm itself, some of those configuration parameters may only be relevant to the current iteration of algorithm implementation.

Leave this as-is for now, and consider my suggestion resolved. I'll bring it up for a community call, doesn't need to hold up the PR.

One example is configuration options for the scheduling algorithm itself...

It sounds like we should have a scheduler API with EPP consuming it.

site-src/concepts/design-principles.md

shaneutt

Seems like a good place to start. I do think that some more refinement can happen, but it can be iterative and maybe we can talk a bit more about it on the community calls. 👍

k8s-ci-robot · 2025-03-28T19:31:21Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: robscott, shaneutt
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

kfswain · 2025-03-31T03:27:42Z

site-src/concepts/design-principles.md

+
+## Objectives over instructions
+
+The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.


Suggested change

The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.

The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how_ an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.

kfswain · 2025-03-31T03:35:28Z

site-src/concepts/design-principles.md

+This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.
+
+
+## Our presets are finely tuned


I think we can more clearly define this.

We want extensibility, and customization. But I think it's very important that we have a turnkey solution that works for the average person.

To word another way, I think good defaults/presets can fall under a larger umbrella of: We want a strong OOB experience for those who don't want to deeply customize. And our later points are about making this easily extensible and adaptable for those who do want to customize. Maybe that's implicit as a part of K8s.

We want a strong OOB experience

+1... I call this "batteries included"

kfswain · 2025-03-31T03:43:06Z

site-src/concepts/design-principles.md

+
+
+## Composable components and reducing reinvention
+While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions.


Consider rewording, perhaps this is personal bias, but the first sentence reads as if to ward off the reader from attempting to implement something new.

The later sentences focus on the value of using what has already been built, which I think is what we are going for. Perhaps move the concept of the first sentence to the end as something like:

Should you encounter a limitation, consider how existing tooling may be extended or improved first. Suggestions always welcomed (and encouraged) at our: sync-link-goes-here.

danehans · 2025-04-03T16:10:24Z

site-src/concepts/design-principles.md

+    composed together with AI Inference extensions as "Inference Gateways"
+    throughout this document.
+
+[Gateway]:https://github.com/kubernetes-sigs/gateway-api


s/[Gateway]/[Gateway API]/

danehans · 2025-04-03T16:10:46Z

site-src/concepts/design-principles.md

+
+## Prioritize stability of the core interfaces
+
+The most critical part of this project is the interfaces between components. To encourage both controller and extension developers to integrate with this project, we need to prioritize the stability of these interfaces.


danehans · 2025-04-03T16:13:18Z

site-src/concepts/design-principles.md

+
+Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.
+
+


danehans · 2025-04-03T16:14:02Z

site-src/concepts/design-principles.md

+This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.
+
+
+## Our presets are finely tuned


We want a strong OOB experience

+1... I call this "batteries included"

danehans · 2025-04-03T16:16:16Z

site-src/concepts/design-principles.md

+
+## Our presets are finely tuned
+
+Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.


AI Gateway

We need to standardize the language in the project. From my understanding, we should use "inference gateway" instead of "AI gateway." We need to do the same with the EPP. For example, the docs refer to the EPP as the "Endpoint Selection Extension". I also refer to the EPP as ESE in kubernetes/website#49898.

danehans · 2025-04-03T16:55:05Z

site-src/concepts/design-principles.md

+## Encourage innovation via extensibility
+
+This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.


make it as easy as possible for AI researchers to experiment with custom scheduling...

Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?

danehans · 2025-04-03T17:01:32Z

site-src/concepts/design-principles.md

+## Objectives over instructions
+
+The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.


One example is configuration options for the scheduling algorithm itself...

It sounds like we should have a scheduler API with EPP consuming it.

k8s-ci-robot requested review from ahg-g, danehans, Jeffwan, kfswain, shaneutt and smarterclayton March 28, 2025 04:57

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 28, 2025

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 28, 2025

shaneutt suggested changes Mar 28, 2025

View reviewed changes

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 28, 2025

Adding Design Principles

7e469c9

robscott force-pushed the design-principles branch from 5127127 to 7e469c9 Compare March 28, 2025 17:54

shaneutt approved these changes Mar 28, 2025

View reviewed changes

kfswain reviewed Mar 31, 2025

View reviewed changes

danehans reviewed Apr 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding Design Principles #596

Adding Design Principles #596

robscott commented Mar 28, 2025

netlify bot commented Mar 28, 2025 •

edited

Loading

shaneutt left a comment

shaneutt Mar 28, 2025 •

edited

Loading

robscott Mar 28, 2025

shaneutt Mar 28, 2025

robscott Mar 28, 2025 •

edited

Loading

shaneutt Mar 28, 2025

shaneutt Mar 28, 2025

danehans Apr 3, 2025

shaneutt Mar 28, 2025

ahg-g Mar 28, 2025

shaneutt Mar 28, 2025

danehans Apr 3, 2025

shaneutt left a comment •

edited

Loading

k8s-ci-robot commented Mar 28, 2025

kfswain Mar 31, 2025

kfswain Mar 31, 2025

danehans Apr 3, 2025

kfswain Mar 31, 2025 •

edited

Loading

danehans Apr 3, 2025

danehans Apr 3, 2025

danehans Apr 3, 2025

danehans Apr 3, 2025

danehans Apr 3, 2025

danehans Apr 3, 2025

danehans Apr 3, 2025

		## The default out of the box experience should be compelling

		We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization.

		## Encourage innovation via extensibility

		This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.

		## Objectives over instructions

		The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.

		This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.


		## Our presets are finely tuned



		## Composable components and reducing reinvention
		While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions.


		## Prioritize stability of the core interfaces

		The most critical part of this project is the interfaces between components. To encourage both controller and extension developers to integrate with this project, we need to prioritize the stability of these interfaces.


		Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.

Adding Design Principles #596

Are you sure you want to change the base?

Adding Design Principles #596

Conversation

robscott commented Mar 28, 2025

netlify bot commented Mar 28, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

shaneutt left a comment

Choose a reason for hiding this comment

shaneutt Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robscott Mar 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shaneutt left a comment • edited Loading

Choose a reason for hiding this comment

k8s-ci-robot commented Mar 28, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfswain Mar 31, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netlify bot commented Mar 28, 2025 •

edited

Loading

shaneutt Mar 28, 2025 •

edited

Loading

robscott Mar 28, 2025 •

edited

Loading

shaneutt left a comment •

edited

Loading

kfswain Mar 31, 2025 •

edited

Loading