Skip to content

feat(DestinationRules): Adding aggression and min_weight_percent to DestinationRules API #3216

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

frgaudet
Copy link
Contributor

@frgaudet frgaudet commented May 23, 2024

Adding envoy slowStartMode aggression and min_weight_percent parameters to DestinationRules API

Fixes #3215

Next PR to come on the cluster_traffic_policy side

First time I contribute here, hope this is good :)

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label May 23, 2024
@istio-policy-bot
Copy link

😊 Welcome @frgaudet! This is either your first contribution to the Istio api repo, or it's been
a while since you've been here.

You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines
by referring to Contributing to Istio.

Thanks for contributing!

Courtesy of your friendly welcome wagon.

Copy link

linux-foundation-easycla bot commented May 23, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@istio-testing istio-testing added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. needs-ok-to-test labels May 23, 2024
@istio-testing
Copy link
Collaborator

Hi @frgaudet. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@frgaudet frgaudet marked this pull request as ready for review May 23, 2024 16:15
@frgaudet frgaudet requested a review from a team as a code owner May 23, 2024 16:15
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we wrap all slow star config together

// By tuning aggression parameter, one could achieve polynomial or exponential speed for traffic increase.
message aggression {
uint32 default_value = 5;
string runtime_key = 6;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do they mean and can you provide an demo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should expose a runtime key here. Also for how many services do you have to configure this? And does it differ from service to service?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we should expose a runtime key here. Also for how many services do you have to configure this? And does it differ from service to service?

As far as I remember runtime_key parameter is mandatory if we want to use an aggression parameter (which is the one we really need).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do they mean and can you provide an demo

We're using Java microservices and pods need a warmup phase in order to have full performance.

In practice the goal is to avoid giving 100% of the traffic to a new READY pod. Leveraging a slow start allow us to give first a certain % of the traffic then ramp-up progressively to 100%.

For this first attempt we tried to use the LoadBalancerSettings feature from Istio. This allow us to specify the duration of the warmup. However in this config we can’t configure 2 important options because they are not exposed by Istio API :

min_weight_percent : specifies the initial percent of origin load, if not present, it is default to 10%.

aggression : will defined the evolution of the % of traffic sent to the pods from min_weight_percent to 100%, by default the the ramp-up curve is linear, but by customising it we can achieve exponential type of curve.

The result (sorry I don't have a picture to illustrate that) is that 10% of traffic still too much : our latency increase a lot and impact our users.

To check if the 2 parameters mentioned above impact our traffic, we used an EnvoyFilter that we applied to 3 clients of our app.

image

Deploying this config from only a portion of our traffic (roughly 75%) with a slow_start_window of 3 minutes and a min_weight_percent of 1% we have been able to observe an impact were we can see the progressive ramp-up of the traffic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is actually mandatory if we want to use the agression parameter. If I try this EnvoyFilter :

apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
  name: h2-control
spec:
  configPatches:
    - applyTo: CLUSTER
      match:
        cluster:
          name: "outbound|8080||http-echo.infra.svc.cluster.local"
      patch:
        operation: MERGE
        value:
          name: "outbound|8080||http-echo.infra.svc.cluster.local"
          lbPolicy: LEAST_REQUEST
          leastRequestLbConfig:
            slowStartConfig:
              min_weight_percent: { value: 99 }
              slow_start_window: "12s"
              aggression: { default_value: 2  }
  workloadSelector:
    labels:
      app: landing-f.gaudet

Then I have this warning in the logs :

landing-f.gaudet istio-proxy {"level":"warning","time":"2024-06-04T04:41:49.406026Z","scope":"envoy config","msg":"gRPC config for type.googleapis.com/envoy.config.cluster.v3.Cluster rejected: Proto constraint validation failed (ClusterValidationError.LeastRequestLbConfig: embedded message failed validation | caused by LeastRequestLbConfigValidationError.SlowStartConfig: embedded message failed validation | caused by SlowStartConfigValidationError.Aggression: embedded message failed validation | caused by RuntimeDoubleValidationError.RuntimeKey: value length must be at least 1 characters):

and the config is not applied. However, if I setup the runtime key

          leastRequestLbConfig:
            slowStartConfig:
              min_weight_percent: { value: 99 }
              slow_start_window: "12s"
              aggression: { default_value: 2, runtime_key: "(" }

Then the config is successfully applied :

istioctl pc cluster landing-f.gaudet.infra --fqdn http-echo.infra.svc.cluster.local -ojson | jq ".[].leastRequestLbConfig"


{
  "slowStartConfig": {
    "slowStartWindow": "12s",
    "aggression": {
      "defaultValue": 2,
      "runtimeKey": "("
    },
    "minWeightPercent": {
      "value": 99
    }
  }
}

@frgaudet
Copy link
Contributor Author

Can we wrap all slow star config together

Just to be sure to understand your request : you mean wrap all slowStart fields into a new message struct ? What is the best practice you would recommend dealing with such proto change ?

  message slowStart {
    google.protobuf.Duration warmup_duration_secs = 1;
    message aggression {
      uint32 default_value = 2;
      string runtime_key = 3;
    }
    uint32 min_weight_percent = 4;
  };

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label May 31, 2024
@frgaudet frgaudet force-pushed the fred/istio/adding-slow-start-parameters branch from 1ac690d to 78a2ea5 Compare June 5, 2024 10:02
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 5, 2024
@hzxuzhonghu
Copy link
Member

Yes @frgaudet i mean something like this

@ramaraochavali
Copy link
Contributor

Do you need separate value of aggression for each service?

@frgaudet
Copy link
Contributor Author

frgaudet commented Jun 6, 2024

Do you need separate value of aggression for each service?

Potentially yes, depending on the Java code, the warmup could be tweaked differently

@frgaudet
Copy link
Contributor Author

@hzxuzhonghu @ramaraochavali do you need something else ?

@frgaudet frgaudet changed the title [WIP] feat(DestinationRules): Adding aggression and min_weight_percent to DestinationRules API feat(DestinationRules): Adding aggression and min_weight_percent to DestinationRules API Jun 18, 2024
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 18, 2024
@frgaudet
Copy link
Contributor Author

@howardjohn do you think this could be added in the next release to come ?

@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 28, 2024
@frgaudet frgaudet force-pushed the fred/istio/adding-slow-start-parameters branch from 21db8c8 to 9442f42 Compare July 5, 2024 08:41
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jul 5, 2024
Signed-off-by: Frédéric Gaudet <[email protected]>
@frgaudet frgaudet force-pushed the fred/istio/adding-slow-start-parameters branch from 1201016 to 0246298 Compare September 26, 2024 07:30
@frgaudet frgaudet requested a review from howardjohn September 26, 2024 08:48
Signed-off-by: Frédéric Gaudet <[email protected]>
@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Sep 30, 2024
@frgaudet
Copy link
Contributor Author

frgaudet commented Sep 30, 2024

To keep this API as simple as possible, I defined duration as a mandatory field if we want to leverage this warmup configuration. Additionnally,

  • aggression is optional, with a defaults value of 1 (linear increase of traffic)
  • minimumPercent is optionnal, defaults to 10

wdyt ?

Thanks for your reviews :)

@frgaudet frgaudet requested a review from howardjohn September 30, 2024 08:20
@howardjohn
Copy link
Member

/ok-to-test

@istio-testing istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels Sep 30, 2024
Signed-off-by: Frédéric Gaudet <[email protected]>
@frgaudet
Copy link
Contributor Author

frgaudet commented Oct 1, 2024

/retest

Signed-off-by: Frédéric Gaudet <[email protected]>
@frgaudet
Copy link
Contributor Author

frgaudet commented Oct 1, 2024

/retest

Signed-off-by: Frédéric Gaudet <[email protected]>
@frgaudet
Copy link
Contributor Author

frgaudet commented Oct 1, 2024

/retest

Copy link
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@frgaudet
Copy link
Contributor Author

frgaudet commented Oct 8, 2024

@ramaraochavali or @hzxuzhonghu can you please have a look, your approval is needed to move forward, thanks !

@istio-testing istio-testing merged commit 2397ade into istio:master Oct 8, 2024
5 checks passed
@frgaudet frgaudet deleted the fred/istio/adding-slow-start-parameters branch October 16, 2024 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support agression and min_weight_percent in DestinationRule
6 participants