-
Notifications
You must be signed in to change notification settings - Fork 573
gateway-api: add experimental API for Istio extensions #2770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
😊 Welcome @frankbu! This is either your first contribution to the Istio api repo, or it's been You can learn more about the Istio working groups, code of conduct, and contributing guidelines Thanks for contributing! Courtesy of your friendly welcome wagon. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
I think there are a few things to figure out:
- If we should do this at all, of course
- Policy attachment or filter.
- If this is a temporary thing until upstreamed, or a permanent API (although some will move, of course)
a. If its temporary, how we make sure it stays temporary
b. If its permanent, how we will handle fields being upstreamed
I think we have to. If we want to get users to start moving off of the Istio API relatively soon I think we need to have at least some way to handle the functional gaps. Waiting for upstream, for some of these things at least, will hold us up too long IMO.
In general, I think we'll need some of both, but unless there is a plan to deprecate extensionRef upstream (in favor of policy attachments), I think route filter is the best way to handle at least for the three extensions I've added in this PR.
My 2 cents. I think it should be considered permanent, and we just deprecate things as soon as there is a way to do it upstream. For the very short term, maybe we should not generate docs for this, just use it internally for experimentation and prototyping. Then in a few months we can start to pick and chose what to document vs hide_from_docs as things become more clear? |
Hey @frankbu, would you be interested in pushing these concepts forward in Gateway API? I haven't spent as much time looking into fault injection, but retry configuration is clearly portable and there seems to be consensus in the community to move forward (as tracked by kubernetes-sigs/gateway-api#1731). Might be worth starting with a GEP for that. |
@robscott, yes, I could do that, but I'm also trying to solve the general problem of all the Istio things that are missing in Gateway API, some of which I'm thinking will not make it into upstream any time soon if ever, but maybe that's not the case for the 3 specific features I'm starting with in this PR? |
Hey @frankbu I missed timeouts in the scope when I first looked at this. Those are also likely to be covered by kubernetes-sigs/gateway-api#1742. Wherever possible, I think it's a win for everyone if we can redirect efforts spent on implementation-specific APIs into portable Gateway API features. I think that's very likely to be the case for both retries and timeouts, I'm not as sure about fault injection. If you're able to help push these concepts into the main API that would help improve Gateway API and avoid future conflicts in Istio when these features do make it in. |
google.protobuf.Duration timeout = 1; | ||
|
||
// Retry policy for HTTP requests. | ||
istio.networking.v1beta1.HTTPRetry retries = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Istio default retry is 3. What's the default semantics with this config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we push this to upstream Gateway API, I think the default would be implementation dependent. Unless I'm missing something, I think the Istio default of 3 is not specified in the API either, it just happens to be 3 with default settings of all the related fields: timeout
, attempts
, perTryTimeout
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default be implementation dependent? This has a big implication on reliability of the data plane.
There are actually two defaults-related questions:
- What happens if I don't specify the override in any routes? Does it still set to 3, and how can I change it?
- What happens if I specify the override but omit the field?
// --> | ||
message RouteFilter { | ||
// Timeout for HTTP requests, default is disabled. | ||
google.protobuf.Duration timeout = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this downstream or upstream time out? Is it protocol specific time out (http/1.1 or h2)? Is it a stream or a whole HTTP connection time out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the HTTP upstream request timeout, the same thing we currently configure in the http route of an Istio VirtualService. I realize there are other timeouts that we will need to be able to configure, but those are more likely to be done using Gateway API PolicyAttachement, instead of in an HTTPRouteFilter extension like this one. This PR is just intended to be a start of filling in the Istio vs. Gateway API functionality gaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are two routes with different timeouts? If it's the upstream timeout, then that would prevent sharing the upstream cluster, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think I meant it's the downstream timeout. It's configured per client request to the upstream cluster. The istio equivalent config is the timeout
field in here: https://istio.io/latest/docs/reference/config/networking/virtual-service/#HTTPRoute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, please make this explicit as "downstream request timeout". There are many timeouts working together (I think 4 of them), and Istio ambiguity is not helpful.
// Fault injection policy to apply on HTTP traffic at the client side. | ||
// Note that timeouts or retries will not be enabled when faults are | ||
// enabled on the client side. | ||
istio.networking.v1beta1.HTTPFaultInjection fault = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think faults should be mingled with protocol settings. Faults are a data plane extension and should be layered as such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I have them together is because all 3 fields in this PR are 3 missing route configuration fields that we currently have together in Istio's HTTPRoute resource. So, I put them together so we only need to define a single Istio CRD for extending Gateway API with any of these properties.
It's a good point, though. If we push these things up to Gateway API proper, instead of using an extensionRef resource, then I think they will all be separate filters: RequestRetryFilter
, RequestTimeoutFilter
, and RequestFaultFilter
, which would be better (esp. for faults).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'd expect Faults to be certainly separate from timeout/retries. It's a different persona, too.
I think there are 2 separate issues:
For the first point - I think the slow process of getting vendor consensus and review is very important. For the second - which I believe is the intent of this API - I think we should first make sure that current Istio APIs Either way - I don't think defining another round of Istio APIs is the right approach. |
In other words: given an Istio VirtualService and a Gateway HttpRoute that generates an internal VirtualService - can we merge the 2 in some way, so Istio VirtualService acts as an extension ? Some 'section id' and naming rules ? |
Yes, I had thought about that. I think it could be done either as an kind: HTTPRoute
metadata:
name: reviews
spec:
parentRefs:
- kind: Service
name: reviews
port: 9080
rules:
- filters:
- type: ExtensionRef
extensionRef:
group: networking.istio.io/v1beta1
kind: VirtualService
name: reviews-v1
backendRefs:
- name: reviews-v1
port: 9080
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-v1
annotations:
gateway.istio.io/extension: "true"
spec:
http:
- timeout: 10s
retries:
attempts: 3
perTryTimeout: 2s
retryOn: connect-failure,refused-stream,503 If, on the other hand, we want it to look more like a kind: HTTPRoute
metadata:
name: reviews
spec:
parentRefs:
- kind: Service
name: reviews
port: 9080
rules:
- backendRefs:
- name: reviews-v1
port: 9080
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews-v1
annotations:
gateway.istio.io/extension: "kind: HTTPRoute, name: reviews, field: rules[0]"
spec:
http:
- timeout: 10s
retries:
attempts: 3
perTryTimeout: 2s
retryOn: connect-failure,refused-stream,503 Thoughts? |
@frankbu I agree with Costin that we should find a way to define attachment for existing API resources without defining yet another API - the real API should be cross-vendor but would take longer to settle. I think it would be best for the attachment to be consistent with the real API - whatever that is. Do we have any proposals for the "workload selector" replacement? This has come up for Waypoints in Ambient binding Authorization policies - I think we need a Gateway approach as well and it should be consistent. The same solution will generalize to Wasm, Telemetry, and EnvoyFilter, which all use the workload selectors. |
@kyessenov Gateway defines a struct that policy attachments are supposed to use to reference Gateway resources. It's currently not able to identify specific sections, but I think that's going to be needed. I wonder if we should be consistent with it for Istio Gateway attachments? https://github.com/kubernetes-sigs/gateway-api/blob/c67f3fd31976ea74ea06a78c2a88ae38df95a41d/apis/v1alpha2/policy_types.go#L24 |
@frankbu Is the idea that we use this PolicyAttachment struct as a field in VirtualService and other Istio APIs? |
I don't even think we need annotation. VirtualService can attach to a gateway by name - we can add a section I don't mind if we add a parentRef field to VirtualService as well - but that will take time to propagate ( CRD changes For Gateway - I was just talking with John it is already possible, just need Istio Gateway to use the selector DestinationRule - it works already since it attaches by name. We need to make it clear that in K8S Gateway ServiceEntry is standalone and defines the equivalent of a Service - so I don't think there are problems, we may revisit In general - I would minimize the changes in Istio - we can add the parentRef and other fields just to look |
And I don't think we need to make huge changes for this - the goal is for users to start primarily interacting with More important would be to update our docs - we were discussing having a new set of docs focused on At the same time - we should contribute upstream and try to get as much functionality as we can in |
@kyessenov Yes, that's what I was thinking, but Costin's comment:
makes me think maybe we should instead try to stick with the current CRDs and just use name matching tricks to connect them, if it really is just a short term issue. I had been going on the assumption that some of Istio's features will never make it into Gateway API, so we should have a clean API for the extensions, but maybe we should wait and see first. |
@costinm Just to clarify, I think you are suggesting that for my example (a route with timeout/retry) we would not use an |
Yes, if you need a feature not provided by Gateway - user can continue with VirtualService until upstream CRD exists. It is true that some Istio features will never make it to the Gateway - that's one of the reasons there is no plan Will also provide a gradual and smooth migration. |
Closing this in favor of:
|
No description provided.