Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding getting started instructions for GKE, Istio, and Kgateway #577

Merged
merged 52 commits into from
Mar 28, 2025

Conversation

nicolexin
Copy link
Contributor

@nicolexin nicolexin commented Mar 26, 2025

Update inference extension getting started guide:

  • Add instructions for GKE, Istio and Kgateway
  • Remove instructions for Envoy Gateway until they have controller-level support

Copy link

linux-foundation-easycla bot commented Mar 26, 2025

@k8s-ci-robot
Copy link
Contributor

Welcome @nicolexin!

It looks like this is your first PR to kubernetes-sigs/gateway-api-inference-extension 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/gateway-api-inference-extension has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 26, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @nicolexin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label Mar 26, 2025
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 26, 2025
Copy link

netlify bot commented Mar 26, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit c82487d
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/67e71fbefa645e00082a1796
😎 Deploy Preview https://deploy-preview-577--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@liu-cong
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 26, 2025
@robscott
Copy link
Member

Thanks @nicolexin!

For KGateway:
/cc @danehans @christian-posta

For Istio:
/cc @LiorLieberman

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nicolexin!

Comment on lines 127 to 129
1. If you run the Endpoint Picker (EPP) with TLS (with `--secureServing=true`), it is currently using a self-signed certificate
and the gateway cannot successfully validate the CA signature and the SAN. Apply the destination rule to bypass verification as
a temporary workaround. A better TLS implementation is being discussed in [Issue 582](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/582).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LiorLieberman My goal with the suggestion was to highlight that Istio's TLS verification is a positive/helpful feature. IMO, the only time neutral OSS docs should point out a shortcoming of an implementation is if that implementation is failing to do something required by the API, that's not the case here.

Copy link
Member

@LiorLieberman LiorLieberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work here @nicolexin! overall LGTM.
minor nits on some threads (nothing is actionable for now probably)

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2025
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2025
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nicolexin!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 28, 2025
@nicolexin
Copy link
Contributor Author

/assign kfswain

@ahg-g
Copy link
Contributor

ahg-g commented Mar 28, 2025

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, LiorLieberman, nicolexin, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 28, 2025
@k8s-ci-robot k8s-ci-robot merged commit 673999e into kubernetes-sigs:main Mar 28, 2025
8 checks passed
@nicolexin
Copy link
Contributor Author

I've added my review of the instructions for kgateway and updated the thread above with the steps/commands.

A few thoughts from me:

  1. the gpu-deployment uses "vllm/vllm-openai:latest"; is that intended? I think we should pin to a specific version.

https://github.com/nicolexin/gateway-api-inference-extension/blob/userguide/config/manifests/vllm/gpu-deployment.yaml#L17

  1. this ClusterRoleBinding gives me an error when I apply it:

https://github.com/nicolexin/gateway-api-inference-extension/blob/userguide/config/manifests/inferencepool.yaml#L117

inferencepool.inference.networking.x-k8s.io/vllm-llama3-8b-instruct created
service/vllm-llama3-8b-instruct-epp created
deployment.apps/vllm-llama3-8b-instruct-epp created
clusterrole.rbac.authorization.k8s.io/pod-read unchanged
error: error validating "https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml": error validating data: ValidationError(ClusterRoleBinding.roleRef): missing required field "apiGroup" in io.k8s.api.rbac.v1.RoleRef; if you choose to ignore these errors, turn validation off with --validate=false
  1. With the startupProbe and agressive liveness/readiness I could not get the llm container to come up FYI. I understand why these settings are important -- we may need to fiddle with them a little to arrive at good defaults. Have others had any issues with this? I've had to back those out on my side to get the llms to work...

Thanks!

I did run through the guide with a GKE cluster end to end and I have no issues applying the ClusterRoleBinding and the vLLM deployments.
@LiorLieberman - did you run into any of the issues above?

@kfswain
Copy link
Collaborator

kfswain commented Mar 28, 2025

@christian-posta

  1. the gpu-deployment uses "vllm/vllm-openai:latest"; is that intended? I think we should pin to a specific version.

Yeah, we mark a specific branch in our version branches. Granted that doesn't make it to our site (we only host main). We may need to break out version specific guides. Cut: #610

  1. this ClusterRoleBinding gives me an error when I apply it:

Interesting, I think that's just a validation error, and they RoleBinding should still exist afaik? I did omit apiGroup but I think it defaults to rbac.authorization.k8s.io/v1. I always err on the side of less config, but we can fix this if it full errors out.

  1. With the startupProbe and agressive liveness/readiness I could not get the llm container to come up FYI. I understand why these settings are important -- we may need to fiddle with them a little to arrive at good defaults. Have others had any issues with this? I've had to back those out on my side to get the llms to work...

They work for me. I'm using A100s its possible we need to have a disclaimer that they are tuned for A100 machines. LMK

@kfswain
Copy link
Collaborator

kfswain commented Mar 28, 2025

Thanks @nicolexin!!! RIP xDS Surgery, you won't be missed :P

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.