Skip to content

gateway-api: add experimental API for Istio extensions #2770

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

frankbu
Copy link
Contributor

@frankbu frankbu commented May 1, 2023

No description provided.

@frankbu frankbu requested a review from a team as a code owner May 1, 2023 15:11
@istio-policy-bot
Copy link

😊 Welcome @frankbu! This is either your first contribution to the Istio api repo, or it's been
awhile since you've been here.

You can learn more about the Istio working groups, code of conduct, and contributing guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

@istio-testing istio-testing added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 1, 2023
Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

I think there are a few things to figure out:

  1. If we should do this at all, of course
  2. Policy attachment or filter.
  3. If this is a temporary thing until upstreamed, or a permanent API (although some will move, of course)
    a. If its temporary, how we make sure it stays temporary
    b. If its permanent, how we will handle fields being upstreamed

cc @costinm @robscott

@frankbu
Copy link
Contributor Author

frankbu commented May 1, 2023

  1. If we should do this at all, of course

I think we have to. If we want to get users to start moving off of the Istio API relatively soon I think we need to have at least some way to handle the functional gaps. Waiting for upstream, for some of these things at least, will hold us up too long IMO.

  1. Policy attachment or filter.

In general, I think we'll need some of both, but unless there is a plan to deprecate extensionRef upstream (in favor of policy attachments), I think route filter is the best way to handle at least for the three extensions I've added in this PR.

  1. If this is a temporary thing until upstreamed, or a permanent API (although some will move, of course)
    a. If its temporary, how we make sure it stays temporary
    b. If its permanent, how we will handle fields being upstreamed

My 2 cents. I think it should be considered permanent, and we just deprecate things as soon as there is a way to do it upstream.

For the very short term, maybe we should not generate docs for this, just use it internally for experimentation and prototyping. Then in a few months we can start to pick and chose what to document vs hide_from_docs as things become more clear?

@robscott
Copy link

robscott commented May 1, 2023

Hey @frankbu, would you be interested in pushing these concepts forward in Gateway API? I haven't spent as much time looking into fault injection, but retry configuration is clearly portable and there seems to be consensus in the community to move forward (as tracked by kubernetes-sigs/gateway-api#1731). Might be worth starting with a GEP for that.

@frankbu
Copy link
Contributor Author

frankbu commented May 1, 2023

Hey @frankbu, would you be interested in pushing these concepts forward in Gateway API?

@robscott, yes, I could do that, but I'm also trying to solve the general problem of all the Istio things that are missing in Gateway API, some of which I'm thinking will not make it into upstream any time soon if ever, but maybe that's not the case for the 3 specific features I'm starting with in this PR?

@robscott
Copy link

robscott commented May 1, 2023

Hey @frankbu I missed timeouts in the scope when I first looked at this. Those are also likely to be covered by kubernetes-sigs/gateway-api#1742. Wherever possible, I think it's a win for everyone if we can redirect efforts spent on implementation-specific APIs into portable Gateway API features. I think that's very likely to be the case for both retries and timeouts, I'm not as sure about fault injection. If you're able to help push these concepts into the main API that would help improve Gateway API and avoid future conflicts in Istio when these features do make it in.

google.protobuf.Duration timeout = 1;

// Retry policy for HTTP requests.
istio.networking.v1beta1.HTTPRetry retries = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Istio default retry is 3. What's the default semantics with this config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we push this to upstream Gateway API, I think the default would be implementation dependent. Unless I'm missing something, I think the Istio default of 3 is not specified in the API either, it just happens to be 3 with default settings of all the related fields: timeout, attempts, perTryTimeout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default be implementation dependent? This has a big implication on reliability of the data plane.

There are actually two defaults-related questions:

  • What happens if I don't specify the override in any routes? Does it still set to 3, and how can I change it?
  • What happens if I specify the override but omit the field?

// -->
message RouteFilter {
// Timeout for HTTP requests, default is disabled.
google.protobuf.Duration timeout = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this downstream or upstream time out? Is it protocol specific time out (http/1.1 or h2)? Is it a stream or a whole HTTP connection time out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the HTTP upstream request timeout, the same thing we currently configure in the http route of an Istio VirtualService. I realize there are other timeouts that we will need to be able to configure, but those are more likely to be done using Gateway API PolicyAttachement, instead of in an HTTPRouteFilter extension like this one. This PR is just intended to be a start of filling in the Istio vs. Gateway API functionality gaps.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there are two routes with different timeouts? If it's the upstream timeout, then that would prevent sharing the upstream cluster, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I meant it's the downstream timeout. It's configured per client request to the upstream cluster. The istio equivalent config is the timeout field in here: https://istio.io/latest/docs/reference/config/networking/virtual-service/#HTTPRoute.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, please make this explicit as "downstream request timeout". There are many timeouts working together (I think 4 of them), and Istio ambiguity is not helpful.

// Fault injection policy to apply on HTTP traffic at the client side.
// Note that timeouts or retries will not be enabled when faults are
// enabled on the client side.
istio.networking.v1beta1.HTTPFaultInjection fault = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think faults should be mingled with protocol settings. Faults are a data plane extension and should be layered as such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I have them together is because all 3 fields in this PR are 3 missing route configuration fields that we currently have together in Istio's HTTPRoute resource. So, I put them together so we only need to define a single Istio CRD for extending Gateway API with any of these properties.

It's a good point, though. If we push these things up to Gateway API proper, instead of using an extensionRef resource, then I think they will all be separate filters: RequestRetryFilter, RequestTimeoutFilter, and RequestFaultFilter, which would be better (esp. for faults).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd expect Faults to be certainly separate from timeout/retries. It's a different persona, too.

@costinm
Copy link
Contributor

costinm commented May 1, 2023

I think there are 2 separate issues:

  1. Upstream defining common APIs that multiple vendors can support
  2. Istio users having ability to use Istio features with the Gateway API

For the first point - I think the slow process of getting vendor consensus and review is very important.

For the second - which I believe is the intent of this API - I think we should first make sure that current Istio APIs
can attach to Gateway, Services and routes - without duplicating the API. We do have the existing merge
mechanism, where Istio VirtualServices, Gatweays etc can be combined with K8S Gateway - to some extent,
and we can make it more.

Either way - I don't think defining another round of Istio APIs is the right approach.

@costinm
Copy link
Contributor

costinm commented May 1, 2023

In other words: given an Istio VirtualService and a Gateway HttpRoute that generates an internal VirtualService - can we merge the 2 in some way, so Istio VirtualService acts as an extension ? Some 'section id' and naming rules ?

@frankbu
Copy link
Contributor Author

frankbu commented May 2, 2023

can we merge the 2 in some way, so Istio VirtualService acts as an extension ?

Yes, I had thought about that. I think it could be done either as an extensionRef, or more like a PolicyAttachement, depending on which resource we want to point at the other. Either way, if we don't want lots of duplication in the 2 resources, I think the Istio resource would need some kind of annotation or other marking to prevent the validator from rejecting a partial config. Taking my timeout/retry example, I think using extensionRef the config could look something like this:

kind: HTTPRoute
metadata:
  name: reviews
spec:
  parentRefs:
  - kind: Service
    name: reviews
    port: 9080
  rules:
  - filters:
    - type: ExtensionRef
      extensionRef:
        group: networking.istio.io/v1beta1
        kind: VirtualService
        name: reviews-v1
    backendRefs:
    - name: reviews-v1
      port: 9080
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-v1
  annotations:
    gateway.istio.io/extension: "true"
spec:
  http:
  - timeout: 10s
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: connect-failure,refused-stream,503

If, on the other hand, we want it to look more like a PolicyAttachement, i.e., have the VS point the the HTTPRoute, then I think it could look something more like this:

kind: HTTPRoute
metadata:
  name: reviews
spec:
  parentRefs:
  - kind: Service
    name: reviews
    port: 9080
  rules:
  - backendRefs:
    - name: reviews-v1
      port: 9080
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews-v1
  annotations:
    gateway.istio.io/extension: "kind: HTTPRoute, name: reviews, field: rules[0]"
spec:
  http:
  - timeout: 10s
    retries:
      attempts: 3
      perTryTimeout: 2s
      retryOn: connect-failure,refused-stream,503

Thoughts?

@kyessenov
Copy link
Contributor

@frankbu I agree with Costin that we should find a way to define attachment for existing API resources without defining yet another API - the real API should be cross-vendor but would take longer to settle. I think it would be best for the attachment to be consistent with the real API - whatever that is.

Do we have any proposals for the "workload selector" replacement? This has come up for Waypoints in Ambient binding Authorization policies - I think we need a Gateway approach as well and it should be consistent. The same solution will generalize to Wasm, Telemetry, and EnvoyFilter, which all use the workload selectors.

@frankbu
Copy link
Contributor Author

frankbu commented May 2, 2023

I think we need a Gateway approach as well and it should be consistent.

@kyessenov Gateway defines a struct that policy attachments are supposed to use to reference Gateway resources. It's currently not able to identify specific sections, but I think that's going to be needed. I wonder if we should be consistent with it for Istio Gateway attachments? https://github.com/kubernetes-sigs/gateway-api/blob/c67f3fd31976ea74ea06a78c2a88ae38df95a41d/apis/v1alpha2/policy_types.go#L24

@kyessenov
Copy link
Contributor

@frankbu Is the idea that we use this PolicyAttachment struct as a field in VirtualService and other Istio APIs?

@costinm
Copy link
Contributor

costinm commented May 2, 2023

I don't even think we need annotation. VirtualService can attach to a gateway by name - we can add a section
( gatewaynamespace/mygateay.sectionId ). It would be most intuitive way to do it IMO.

I don't mind if we add a parentRef field to VirtualService as well - but that will take time to propagate ( CRD changes
are tricky with older versions of Istio still around ), and I hope this is all temporary until the Istio features moe
upstream.

For Gateway - I was just talking with John it is already possible, just need Istio Gateway to use the selector
matching the auto-deployed gateway. I have used it in a test. I think VirtualService works too - except the section ( it now applies to all listeners, which is not ideal ).

DestinationRule - it works already since it attaches by name. We need to make it clear that in K8S Gateway
world it is only 'producer side' and same namespace with the service it attaches to.

ServiceEntry is standalone and defines the equivalent of a Service - so I don't think there are problems, we may revisit
when egress is defined in K8S Gateway.

In general - I would minimize the changes in Istio - we can add the parentRef and other fields just to look
nice, but it will be nice to have it working with the current CRDs. After all the K8S Gateway API was based
on Istio APIs, with cleanups and better isolation.

@costinm
Copy link
Contributor

costinm commented May 2, 2023

And I don't think we need to make huge changes for this - the goal is for users to start primarily interacting with
K8S Gateway ( where Istiod can auto-manage the deployment ) and use HttpRoute when possible, and use the
Istio old APIs only for things not covered in the upstream API or while migrating.

More important would be to update our docs - we were discussing having a new set of docs focused on
'safe istio' ( name TBD), ambient, gateway - where we can mention how current Istio APIs can be used
as extensions for things not yet available.

At the same time - we should contribute upstream and try to get as much functionality as we can in
'optional' or even 'core' policies, shared with all other vendors. I don't think using VirtualService or istio
Gateway or DestinationRule should be more than a short term migration helper.

@frankbu
Copy link
Contributor Author

frankbu commented May 2, 2023

@frankbu Is the idea that we use this PolicyAttachment struct as a field in VirtualService and other Istio APIs?

@kyessenov Yes, that's what I was thinking, but Costin's comment:

but that will take time to propagate ( CRD changes are tricky with older versions of Istio still around ), and I hope this is all temporary until the Istio features moe upstream.

makes me think maybe we should instead try to stick with the current CRDs and just use name matching tricks to connect them, if it really is just a short term issue. I had been going on the assumption that some of Istio's features will never make it into Gateway API, so we should have a clean API for the extensions, but maybe we should wait and see first.

@frankbu
Copy link
Contributor Author

frankbu commented May 2, 2023

the goal is for users to start primarily interacting with K8S Gateway ( where Istiod can auto-manage the deployment ) and use HttpRoute when possible, and use the Istio old APIs only for things not covered in the upstream API or while migrating.

@costinm Just to clarify, I think you are suggesting that for my example (a route with timeout/retry) we would not use an HTTPRoute at all, but instead attach a VirtualService (with the full route config) to the gateway. My previous thinking was that we would use HTTPRoute for the route config but then use an Istio config resource to define only the timeout/retry properties, and we would merge the two into the generated VirtualService bound to the gateway. If I'm understanding it right, your proposal is simpler: just use VirtualService for the whole thing when HTTPRoute doesn't support something that's needed. Is that right?

@costinm
Copy link
Contributor

costinm commented May 3, 2023

Yes, if you need a feature not provided by Gateway - user can continue with VirtualService until upstream CRD exists.
There is no need to invent another API when we have a working and tested one.

It is true that some Istio features will never make it to the Gateway - that's one of the reasons there is no plan
to deprecate the current APIs. The other reason is that a lot of users and tools still use VirtualService - it would
be nice if it worked with the gateways managed by Istiod ( i.e. auto-created by K8S Gateway ) and mix usage
of HttpRoute and VirtualService.

Will also provide a gradual and smooth migration.

@frankbu
Copy link
Contributor Author

frankbu commented May 5, 2023

Closing this in favor of:

  1. Upstream discussion started here: GEP-1742: HTTPRoute Timeouts API kubernetes-sigs/gateway-api#1997
  2. Start to document when and how to use Istio config together with Gateway API config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants