-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Design Principles #596
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @robscott!
Some comments and feedback for your considerations. 🖖
## Focus on the core interfaces | ||
|
||
There are two interfaces of note here: | ||
|
||
### 1. Gateway -> Endpoint Picker | ||
At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to. | ||
|
||
### 2. Endpoint Picker -> Model Server Framework | ||
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics. | ||
|
||
Although we can extend these interfaces in the future, it’s critical to get these right early in the project and stabilize them as soon as possible. We want to be able to give controller and extension developers a stable target to build against. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me this reads more like a statement of intent do deliver specific existing APIs within a time-frame of "as soon as possible". It doesn't strike me as something that you would document in "Design Principles" but instead something more like a GitHub Milestone?
At the time of writing, there are no open milestones for this project, to signal to the community what's intended to be next. I suggest re-compiling the intent as a Milestone for delivering the endpoint picker API as GA by a specific time, and put a due date on it to signal clearly to the community the specific time-frame you're trying to target.
I would expect "Design Principles" for a project like this to start with some kind of mission statement at the top which is the guiding goal of the project which the underlying design principles then serve. Perhaps:
## Focus on the core interfaces | |
There are two interfaces of note here: | |
### 1. Gateway -> Endpoint Picker | |
At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to. | |
### 2. Endpoint Picker -> Model Server Framework | |
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics. | |
Although we can extend these interfaces in the future, it’s critical to get these right early in the project and stabilize them as soon as possible. We want to be able to give controller and extension developers a stable target to build against. | |
These principles guide our efforts to build flexible [Gateway API] extensions that empower the development of high-performance [AI Inference] routing technologies—balancing rapid delivery with long-term growth. | |
> **Note**: For simplicity, we'll refer to Gateway API Gateways which are composed together with AI | |
> Inference extensions as "AI Gateways" throughout this document. | |
[Gateway]:https://github.com/kubernetes-sigs/gateway-api | |
[AI Inference]:https://www.arm.com/glossary/ai-inference |
LMKWYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great intro, thanks! I added this while also reworking the first principle to be more of a "principle" since I do think it's important. Hope this makes sense.
## The default out of the box experience should be compelling | ||
|
||
We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this one, but I think "compelling" can be very open for interpretation, particularly if you're not a native english speaker 🤔
Consider this thought I had:
## The default out of the box experience should be compelling | |
We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization. | |
## Our presets are finely tuned | |
Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated, thanks! For now I've copied your suggestion verbatim, because I think it's an improvement over what I initially had, but I also want to go a bit broader here. I'm trying to find a way to say two things in this doc:
- Extensions are critical for innovation and flexibility
- Our default path needs to be great, so we need to continue to invest in our reference extension
This principle was meant to cover 2, and I think I still haven't communicated it quite well enough quite yet, will think about this more and iterate on it. Any additional suggestions would be appreciated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extensions are critical for innovation and flexibility
Ok cool, I think you can say this at the top. It's kinda a "guiding principle".
Our default path needs to be great, so we need to continue to invest in our reference extension
Maybe:
## The default out of the box experience should be compelling | |
We want to ensure that our defaults, including our reference Endpoint Picker, are sufficiently tuned that most Inference Gateway users will have a great experience without the need for significant customization. | |
## Our presets are finely tuned | |
We provide APIs and reference implementations for the most common inference requirements. Our defaults for those APIs and implementations—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization. If you take all of our default extensions and attach them to a compatible `Gateway`, it just "works out of the box". |
Might need some more workshopping, but LMKWYT? 🤔
## Encourage innovation via extensibility | ||
|
||
This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it as easy as possible for AI researchers to experiment with custom scheduling...
Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?
## Objectives over instructions | ||
|
||
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section, more than any other section, as a "Design Principle" is not resonating with me just yet. To me this reads like it's trying to talk about scope control. Could you please help me to better understand the intent here, by providing a somewhat detailed example of a situation that could occur which would run counter to this principle? I think that would help me to better understand what it's trying to convey 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example is configuration options for the scheduling algorithm itself, some of those configuration parameters may only be relevant to the current iteration of algorithm implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave this as-is for now, and consider my suggestion resolved. I'll bring it up for a community call, doesn't need to hold up the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example is configuration options for the scheduling algorithm itself...
It sounds like we should have a scheduler API with EPP consuming it.
5127127
to
7e469c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good place to start. I do think that some more refinement can happen, but it can be iterative and maybe we can talk a bit more about it on the community calls. 👍
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: robscott, shaneutt The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
## Objectives over instructions | ||
|
||
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread. | |
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how_ an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread. |
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics. | ||
|
||
|
||
## Our presets are finely tuned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can more clearly define this.
We want extensibility, and customization. But I think it's very important that we have a turnkey solution that works for the average person.
To word another way, I think good defaults/presets can fall under a larger umbrella of: We want a strong OOB experience for those who don't want to deeply customize. And our later points are about making this easily extensible and adaptable for those who do want to customize. Maybe that's implicit as a part of K8s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want a strong OOB experience
+1... I call this "batteries included"
|
||
|
||
## Composable components and reducing reinvention | ||
While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider rewording, perhaps this is personal bias, but the first sentence reads as if to ward off the reader from attempting to implement something new.
The later sentences focus on the value of using what has already been built, which I think is what we are going for. Perhaps move the concept of the first sentence to the end as something like:
Should you encounter a limitation, consider how existing tooling may be extended or improved first. Suggestions always welcomed (and encouraged) at our: sync-link-goes-here.
composed together with AI Inference extensions as "Inference Gateways" | ||
throughout this document. | ||
|
||
[Gateway]:https://github.com/kubernetes-sigs/gateway-api |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/[Gateway]/[Gateway API]/
|
||
## Prioritize stability of the core interfaces | ||
|
||
The most critical part of this project is the interfaces between components. To encourage both controller and extension developers to integrate with this project, we need to prioritize the stability of these interfaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/is/are/
|
||
Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete L35
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics. | ||
|
||
|
||
## Our presets are finely tuned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want a strong OOB experience
+1... I call this "batteries included"
|
||
## Our presets are finely tuned | ||
|
||
Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AI Gateway
We need to standardize the language in the project. From my understanding, we should use "inference gateway" instead of "AI gateway." We need to do the same with the EPP. For example, the docs refer to the EPP as the "Endpoint Selection Extension". I also refer to the EPP as ESE in kubernetes/website#49898.
## Encourage innovation via extensibility | ||
|
||
This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it as easy as possible for AI researchers to experiment with custom scheduling...
Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call?
## Objectives over instructions | ||
|
||
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One example is configuration options for the scheduling algorithm itself...
It sounds like we should have a scheduler API with EPP consuming it.
As the project continues to grow, it would be helpful to have some high level design principles for the project. These principles can help guide us when determining which features and work to prioritize.
/cc @ahg-g @smarterclayton @danehans @kfswain @Jeffwan @shaneutt