Adding getting started instructions for GKE, Istio, and Kgateway #577

nicolexin · 2025-03-26T16:52:12Z

Update inference extension getting started guide:

Add instructions for GKE, Istio and Kgateway
Remove instructions for Envoy Gateway until they have controller-level support

linux-foundation-easycla · 2025-03-26T16:52:20Z

The committers listed above are authorized under a signed CLA.

✅ login: nicolexin / name: Nicole Xin (9343660, 0519935, 3e7e74e, 6d48b5b, a140a3e, 8d235f6, d8d4666, 557c44f, 5a2677e, 63d7c40, 52318b3, 8ef12a8, 7b490de, ee7fa97, efb8c35, 8a878f8, 21100f9, e82e074, a627ea7, 9c8d00d, f6f9538, f0b59e4, 59cbe2e, afc64dc, 048189a, 35a835f, e512145, c06cffd, ff8b2a1, 6bc07c6, d493258, 2f9baea, c1b563b, 484f19f, 9cb2575, 6a9f91a, 0a24389, 2574453, 365d847, e4471ec, ce19438, e9f2298, c82487d, a679070, d0ddd16, b63263d, e1c0b1d, b6d4c7a, 6d3642a, d71f29c, d5fd70f, 41fc083)

k8s-ci-robot · 2025-03-26T16:52:21Z

Welcome @nicolexin!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2025-03-26T16:52:22Z

Hi @nicolexin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-03-26T16:52:40Z

✅ Deploy Preview for gateway-api-inference-extension ready!

Name	Link
🔨 Latest commit	`c82487d`
🔍 Latest deploy log	https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67e71fbefa645e00082a1796
😎 Deploy Preview	https://deploy-preview-577--gateway-api-inference-extension.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

liu-cong · 2025-03-26T17:05:11Z

/ok-to-test

robscott · 2025-03-26T17:11:58Z

Thanks @nicolexin!

For KGateway:
/cc @danehans @christian-posta

For Istio:
/cc @LiorLieberman

robscott

Thanks @nicolexin!

config/manifests/gateway/istio/destination-rule.yaml

config/manifests/inferencepool.yaml

site-src/guides/index.md

robscott · 2025-03-28T21:02:00Z

site-src/guides/index.md

+      1. If you run the Endpoint Picker (EPP) with TLS (with `--secureServing=true`), it is currently using a self-signed certificate 
+      and the gateway cannot successfully validate the CA signature and the SAN. Apply the destination rule to bypass verification as 
+      a temporary workaround. A better TLS implementation is being discussed in [Issue 582](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/582).


@LiorLieberman My goal with the suggestion was to highlight that Istio's TLS verification is a positive/helpful feature. IMO, the only time neutral OSS docs should point out a shortcoming of an implementation is if that implementation is failing to do something required by the API, that's not the case here.

Co-authored-by: Rob Scott <[email protected]>

LiorLieberman

Thanks for all the work here @nicolexin! overall LGTM.
minor nits on some threads (nothing is actionable for now probably)

/lgtm

robscott

Thanks @nicolexin!

/lgtm

nicolexin · 2025-03-28T22:25:36Z

/assign kfswain

ahg-g · 2025-03-28T22:33:57Z

/approve

k8s-ci-robot · 2025-03-28T22:34:05Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LiorLieberman, nicolexin, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ahg-g]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

nicolexin · 2025-03-28T22:43:39Z

I've added my review of the instructions for kgateway and updated the thread above with the steps/commands.

A few thoughts from me:

the gpu-deployment uses "vllm/vllm-openai:latest"; is that intended? I think we should pin to a specific version.

https://github.com/nicolexin/gateway-api-inference-extension/blob/userguide/config/manifests/vllm/gpu-deployment.yaml#L17

this ClusterRoleBinding gives me an error when I apply it:

https://github.com/nicolexin/gateway-api-inference-extension/blob/userguide/config/manifests/inferencepool.yaml#L117
inferencepool.inference.networking.x-k8s.io/vllm-llama3-8b-instruct created
service/vllm-llama3-8b-instruct-epp created
deployment.apps/vllm-llama3-8b-instruct-epp created
clusterrole.rbac.authorization.k8s.io/pod-read unchanged
error: error validating "https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml": error validating data: ValidationError(ClusterRoleBinding.roleRef): missing required field "apiGroup" in io.k8s.api.rbac.v1.RoleRef; if you choose to ignore these errors, turn validation off with --validate=false
With the startupProbe and agressive liveness/readiness I could not get the llm container to come up FYI. I understand why these settings are important -- we may need to fiddle with them a little to arrive at good defaults. Have others had any issues with this? I've had to back those out on my side to get the llms to work...

Thanks!

I did run through the guide with a GKE cluster end to end and I have no issues applying the ClusterRoleBinding and the vLLM deployments.
@LiorLieberman - did you run into any of the issues above?

kfswain · 2025-03-28T22:50:18Z

@christian-posta

the gpu-deployment uses "vllm/vllm-openai:latest"; is that intended? I think we should pin to a specific version.

Yeah, we mark a specific branch in our version branches. Granted that doesn't make it to our site (we only host main). We may need to break out version specific guides. Cut: #610

this ClusterRoleBinding gives me an error when I apply it:

Interesting, I think that's just a validation error, and they RoleBinding should still exist afaik? I did omit apiGroup but I think it defaults to rbac.authorization.k8s.io/v1. I always err on the side of less config, but we can fix this if it full errors out.

With the startupProbe and agressive liveness/readiness I could not get the llm container to come up FYI. I understand why these settings are important -- we may need to fiddle with them a little to arrive at good defaults. Have others had any issues with this? I've had to back those out on my side to get the llms to work...

They work for me. I'm using A100s its possible we need to have a disclaimer that they are tuned for A100 machines. LMK

kfswain · 2025-03-28T22:51:54Z

Thanks @nicolexin!!! RIP xDS Surgery, you won't be missed :P

nicolexin added 13 commits March 25, 2025 14:33

Create resources.yaml for kgateway

6bc07c6

Update getting started guide for KGateway

63d7c40

Replace Envoy Gateway user guide with GKE user guide

048189a

Create resources.yaml for GKE Gateway

a679070

Delete config/manifests/gateway/enable_patch_policy.yaml

a627ea7

Delete config/manifests/gateway/gateway.yaml

7b490de

Delete config/manifests/gateway/patch_policy.yaml

9c8d00d

Delete config/manifests/gateway/traffic_policy.yaml

0519935

Add http2 appProtocol to EPP service

3e7e74e

Add user guide for Istio

a140a3e

Create resources.yaml for Istio

8a878f8

Fix GKE gateway name to match the user guide

f0b59e4

Fix cleanup instructions to refer up-to-date YAMLs

c06cffd

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 26, 2025

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Mar 26, 2025

k8s-ci-robot requested review from liu-cong and robscott March 26, 2025 16:52

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 26, 2025

k8s-ci-robot requested review from LiorLieberman and danehans March 26, 2025 17:12

robscott reviewed Mar 28, 2025

View reviewed changes

nicolexin and others added 5 commits March 28, 2025 14:12

Add clarification on the EPP secureServing default value.

484f19f

Co-authored-by: Rob Scott <[email protected]>

Add instructions for configuring timeout

d71f29c

Create httproute-with-timeout.yaml

41fc083

Create gcp-backend-policy.yaml

d5fd70f

Add cleanup for GCPBackendPolicy

d0ddd16

LiorLieberman approved these changes Mar 28, 2025

View reviewed changes

k8s-ci-robot assigned LiorLieberman Mar 28, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2025

Remove namespace from destination-rule.yaml

e1c0b1d

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2025

nicolexin added 3 commits March 28, 2025 15:11

Rename inferencepool.yaml to inferencepool-resources.yaml

e4471ec

Rename inferencepool.yaml to inferencepool-resources.yaml

365d847

Rename inferencepool.yaml to inferencepool-resources.yaml

c82487d

robscott approved these changes Mar 28, 2025

View reviewed changes

k8s-ci-robot assigned robscott Mar 28, 2025

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2025

k8s-ci-robot assigned kfswain Mar 28, 2025

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2025

k8s-ci-robot merged commit 673999e into kubernetes-sigs:main Mar 28, 2025
8 checks passed

robscott mentioned this pull request Mar 28, 2025

Removing obsolete part of metrics guide #608

Merged

kfswain mentioned this pull request Mar 28, 2025

Make Version specific sets of the getting started guide #610

Open

danehans mentioned this pull request Mar 31, 2025

Docs: Updates getting started guide for kgateway #575

Closed

nicolexin deleted the userguide branch April 1, 2025 15:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding getting started instructions for GKE, Istio, and Kgateway #577

Adding getting started instructions for GKE, Istio, and Kgateway #577

nicolexin commented Mar 26, 2025 •

edited

Loading

linux-foundation-easycla bot commented Mar 26, 2025 •

edited

Loading

k8s-ci-robot commented Mar 26, 2025

k8s-ci-robot commented Mar 26, 2025

netlify bot commented Mar 26, 2025 •

edited

Loading

liu-cong commented Mar 26, 2025

robscott commented Mar 26, 2025

robscott left a comment

robscott Mar 28, 2025

LiorLieberman left a comment

robscott left a comment

nicolexin commented Mar 28, 2025

ahg-g commented Mar 28, 2025

k8s-ci-robot commented Mar 28, 2025

nicolexin commented Mar 28, 2025

kfswain commented Mar 28, 2025 •

edited

Loading

kfswain commented Mar 28, 2025

Adding getting started instructions for GKE, Istio, and Kgateway #577

Adding getting started instructions for GKE, Istio, and Kgateway #577

Conversation

nicolexin commented Mar 26, 2025 • edited Loading

linux-foundation-easycla bot commented Mar 26, 2025 • edited Loading

k8s-ci-robot commented Mar 26, 2025

k8s-ci-robot commented Mar 26, 2025

netlify bot commented Mar 26, 2025 • edited Loading

✅ Deploy Preview for gateway-api-inference-extension ready!

liu-cong commented Mar 26, 2025

robscott commented Mar 26, 2025

robscott left a comment

Choose a reason for hiding this comment

robscott Mar 28, 2025

Choose a reason for hiding this comment

LiorLieberman left a comment

Choose a reason for hiding this comment

robscott left a comment

Choose a reason for hiding this comment

nicolexin commented Mar 28, 2025

ahg-g commented Mar 28, 2025

k8s-ci-robot commented Mar 28, 2025

nicolexin commented Mar 28, 2025

kfswain commented Mar 28, 2025 • edited Loading

kfswain commented Mar 28, 2025

nicolexin commented Mar 26, 2025 •

edited

Loading

linux-foundation-easycla bot commented Mar 26, 2025 •

edited

Loading

netlify bot commented Mar 26, 2025 •

edited

Loading

kfswain commented Mar 28, 2025 •

edited

Loading