Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Design Principles #596

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

robscott
Copy link
Member

As the project continues to grow, it would be helpful to have some high level design principles for the project. These principles can help guide us when determining which features and work to prioritize.

/cc @ahg-g @smarterclayton @danehans @kfswain @Jeffwan @shaneutt

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 28, 2025
Copy link

netlify bot commented Mar 28, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 7e469c9
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67e6e24cfd7895000846db8d
😎 Deploy Preview https://deploy-preview-596--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 28, 2025
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @robscott!

Some comments and feedback for your considerations. 🖖

Comment on lines 3 to 13
## Focus on the core interfaces

There are two interfaces of note here:

### 1. Gateway -> Endpoint Picker
At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.

### 2. Endpoint Picker -> Model Server Framework
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.

Although we can extend these interfaces in the future, it’s critical to get these right early in the project and stabilize them as soon as possible. We want to be able to give controller and extension developers a stable target to build against.
Copy link
Member

@shaneutt shaneutt Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this reads more like a statement of intent do deliver specific existing APIs within a time-frame of "as soon as possible". It doesn't strike me as something that you would document in "Design Principles" but instead something more like a GitHub Milestone?

At the time of writing, there are no open milestones for this project, to signal to the community what's intended to be next. I suggest re-compiling the intent as a Milestone for delivering the endpoint picker API as GA by a specific time, and put a due date on it to signal clearly to the community the specific time-frame you're trying to target.

I would expect "Design Principles" for a project like this to start with some kind of mission statement at the top which is the guiding goal of the project which the underlying design principles then serve. Perhaps:

Suggested change
## Focus on the core interfaces
There are two interfaces of note here:
### 1. Gateway -> Endpoint Picker
At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to.
### 2. Endpoint Picker -> Model Server Framework
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.
Although we can extend these interfaces in the future, it’s critical to get these right early in the project and stabilize them as soon as possible. We want to be able to give controller and extension developers a stable target to build against.
These principles guide our efforts to build flexible [Gateway API] extensions that empower the development of high-performance [AI Inference] routing technologies—balancing rapid delivery with long-term growth.
> **Note**: For simplicity, we'll refer to Gateway API Gateways which are composed together with AI
> Inference extensions as "AI Gateways" throughout this document.
[Gateway]:https://github.com/kubernetes-sigs/gateway-api
[AI Inference]:https://www.arm.com/glossary/ai-inference

LMKWYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great intro, thanks! I added this while also reworking the first principle to be more of a "principle" since I do think it's important. Hope this makes sense.

Comment on lines 16 to 18
## The default out of the box experience should be compelling

We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this one, but I think "compelling" can be very open for interpretation, particularly if you're not a native english speaker 🤔

Consider this thought I had:

Suggested change
## The default out of the box experience should be compelling
We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization.
## Our presets are finely tuned
Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.

Copy link
Member Author

@robscott robscott Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks! For now I've copied your suggestion verbatim, because I think it's an improvement over what I initially had, but I also want to go a bit broader here. I'm trying to find a way to say two things in this doc:

  1. Extensions are critical for innovation and flexibility
  2. Our default path needs to be great, so we need to continue to invest in our reference extension

This principle was meant to cover 2, and I think I still haven't communicated it quite well enough quite yet, will think about this more and iterate on it. Any additional suggestions would be appreciated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensions are critical for innovation and flexibility

Ok cool, I think you can say this at the top. It's kinda a "guiding principle".

Our default path needs to be great, so we need to continue to invest in our reference extension

Maybe:

Suggested change
## The default out of the box experience should be compelling
We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization.
## Our presets are finely tuned
We provide APIs and reference implementations for the most common inference requirements. Our defaults for those APIs and implementations—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization. If you take all of our default extensions and attach them to a compatible `Gateway`, it just "works out of the box".

Might need some more workshopping, but LMKWYT? 🤔

Comment on lines +21 to +38
## Encourage innovation via extensibility

This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it as easy as possible for AI researchers to experiment with custom scheduling...

Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?

Comment on lines +26 to +43
## Objectives over instructions

The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section, more than any other section, as a "Design Principle" is not resonating with me just yet. To me this reads like it's trying to talk about scope control. Could you please help me to better understand the intent here, by providing a somewhat detailed example of a situation that could occur which would run counter to this principle? I think that would help me to better understand what it's trying to convey 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is configuration options for the scheduling algorithm itself, some of those configuration parameters may only be relevant to the current iteration of algorithm implementation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave this as-is for now, and consider my suggestion resolved. I'll bring it up for a community call, doesn't need to hold up the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is configuration options for the scheduling algorithm itself...

It sounds like we should have a scheduler API with EPP consuming it.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 28, 2025
@robscott robscott force-pushed the design-principles branch from 5127127 to 7e469c9 Compare March 28, 2025 17:54
Copy link
Member

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like a good place to start. I do think that some more refinement can happen, but it can be iterative and maybe we can talk a bit more about it on the community calls. 👍

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: robscott, shaneutt
Once this PR has been reviewed and has the lgtm label, please assign danehans for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


## Objectives over instructions

The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how_ an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.

This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.


## Our presets are finely tuned
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can more clearly define this.

We want extensibility, and customization. But I think it's very important that we have a turnkey solution that works for the average person.

To word another way, I think good defaults/presets can fall under a larger umbrella of: We want a strong OOB experience for those who don't want to deeply customize. And our later points are about making this easily extensible and adaptable for those who do want to customize. Maybe that's implicit as a part of K8s.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want a strong OOB experience

+1... I call this "batteries included"



## Composable components and reducing reinvention
While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions.
Copy link
Collaborator

@kfswain kfswain Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider rewording, perhaps this is personal bias, but the first sentence reads as if to ward off the reader from attempting to implement something new.

The later sentences focus on the value of using what has already been built, which I think is what we are going for. Perhaps move the concept of the first sentence to the end as something like:

Should you encounter a limitation, consider how existing tooling may be extended or improved first. Suggestions always welcomed (and encouraged) at our: sync-link-goes-here.

composed together with AI Inference extensions as "Inference Gateways"
throughout this document.

[Gateway]:https://github.com/kubernetes-sigs/gateway-api
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/[Gateway]/[Gateway API]/


## Prioritize stability of the core interfaces

The most critical part of this project is the interfaces between components. To encourage both controller and extension developers to integrate with this project, we need to prioritize the stability of these interfaces.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/is/are/


Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete L35

This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics.


## Our presets are finely tuned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want a strong OOB experience

+1... I call this "batteries included"


## Our presets are finely tuned

Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Gateway

We need to standardize the language in the project. From my understanding, we should use "inference gateway" instead of "AI gateway." We need to do the same with the EPP. For example, the docs refer to the EPP as the "Endpoint Selection Extension". I also refer to the EPP as ESE in kubernetes/website#49898.

Comment on lines +21 to +38
## Encourage innovation via extensibility

This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make it as easy as possible for AI researchers to experiment with custom scheduling...

Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?

Comment on lines +26 to +43
## Objectives over instructions

The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One example is configuration options for the scheduling algorithm itself...

It sounds like we should have a scheduler API with EPP consuming it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants