-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PoC implementation #4
Conversation
cluster: original_destination_cluster | ||
timeout: 1000s # Increase route timeout | ||
http_filters: | ||
- name: envoy.filters.http.ext_proc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, there's a first class API for ext proc which could replace most of the bootstrap config
https://gateway.envoyproxy.io/docs/tasks/extensibility/ext-proc/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How widely is the API supported? If it's more widely supported than the bootstrap config (or dramatically easier for most who would want to run the demo), then I agree we should follow up with that quickly. Or present it as an alternative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the bootstrap has some hardcoded values like
socket_address:
address: 34.118.231.147
port_value: 9002
so this is most likely not going to work when reproduced
The Ext Proc API will eliminate the need to hardcode IPs and ports as well as reduce a lot of the envoy proxy filter config which is generated internally.
The API was introduced in v1.1, its in v1alpha1
but its here to stay, and so is the bootstrap
field, the API provides a better interface to the user for this use case
If this runs, I would suggest we merge and move ahead and take additional feedback as comments to the merged PR or new PRs / issues. The value of the PoC is people trying it, not reviewing it. Also, I'd suggest we follow Kube conventions for commits, which are:
Squash commits and rebase down to minimal diffs (you can have multiple commits if each is logically distinct, like adding a test that fails and then fixing the issue, or delivering a fix before your main feature changes). |
Ack. Can we do this at merge? Or is it typically before? |
Generally recommend everyone squash before they get approval, with a good comment and description that a future version of yourself and others can interpret to understand why the code was included. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this runs, I would suggest we merge and move ahead and take additional feedback as comments to the merged PR or new PRs / issues. The value of the PoC is people trying it, not reviewing it.
I agree. It would be easier to test out once the PR is merged. This project is in early phase without releases yet so people should understand the risk.
Done |
- Go (for local development) | ||
|
||
## vLLM | ||
***This PoC uses a modified vLLM fork, the public image of the fork is here: `ghcr.io/tomatillo-and-multiverse/vllm:demo`*** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to use a modified fork?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document why the fork.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And specifically link to the commits/commit-range/branch/PR that added them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I linked to the fork, there is not currently a PR for the fork
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain, smarterclayton The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Implementation of PoC shared in wg-serving.
Making the code public so that other contributors can test our implementation, reproduce numbers and check assumptions.