-
Notifications
You must be signed in to change notification settings - Fork 85
feat(conformance): Add HTTPRouteMultipleRulesDifferentPools test #834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: SinaChavoshi The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @SinaChavoshi. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
# --- HTTPRoute Definition for Multiple Pools --- | ||
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: HTTPRoute | ||
metadata: | ||
name: httproute-multi-pool-rules | ||
namespace: gateway-conformance-app-backend | ||
spec: | ||
parentRefs: | ||
- group: gateway.networking.k8s.io | ||
kind: Gateway | ||
name: conformance-gateway | ||
namespace: gateway-conformance-infra | ||
sectionName: http | ||
rules: | ||
- matches: | ||
- path: | ||
type: PathPrefix | ||
value: /app-a | ||
backendRefs: | ||
- group: inference.networking.x-k8s.io | ||
kind: InferencePool | ||
name: pool-a | ||
port: 8080 | ||
weight: 1 | ||
- matches: | ||
- path: | ||
type: PathPrefix | ||
value: /app-b | ||
backendRefs: | ||
- group: inference.networking.x-k8s.io | ||
kind: InferencePool | ||
name: pool-b | ||
port: 8080 | ||
weight: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding here a comment -
not really related to conformance but more of a general comment on the approach we take for multi pool setup.
(might affect also conformance of course).
currently when using a single EPP and sending an inference request - the model name is specified in the body. we have BBR that can take that and put it also as a header. several issues here:
- in order to be aligned with how BBR works, it might be better to have this test with header matcher and not path prefix.
- the bigger issue is, even if previous point is addressed, the ModelName that we get in the request body is LoRA adapter name and not the base model name. we need to somehow understand what is the correct EPP from the specified LoRA name, otherwise, we move the problem to the user, who will need to set a prefix matcher (or header matcher) that matches the correct pool. this doesn't sound to me like a good UX.
the UX I'm expecting is:
user sends a request in a similar way to how it's done with a single pool.
some component logic is triggered BEFORE calling epp (similar to how BBR works). that logic is responsible for injecting the right header/path matcher. then flow continues as usual.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @nirrozenbaum I think the idea here is that we're trying to test splitting two entirely different and disconnected InferencePools. So the /app-a
and /app-b
prefixes are not meant to be model names, but instead just generic ways to route to different InferencePools. You could theoretically match on any arbitrary request attribute such as header or query param, but I'd hate to have this confused with matching on a model name, as that's not really the intent of the test.
I do think it would be useful to add an e2e test in the future that includes BBR, but that would require us to have a portable way to configure BBR + use it in some conformance tests like this. I definitely want to get to that point, but I think it will take some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fair enough
/cc |
/ok-to-test |
Description
This PR introduces a new conformance test,
HTTPRouteMultipleRulesDifferentPools
, which validates the setup with oneGateway
and oneHTTPRoute
to successfully route traffic to multiple, distinctInferencePool
backends.local run results: ( Ran on commit 2a67e7)