-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Metrics] Add vLLM streaming support for metrics #329
Conversation
Hi @JeffLuoo. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/ok-to-test |
1d66114
to
357d7c8
Compare
It seems the change needs to rebase on top of #372. |
357d7c8
to
3970bf9
Compare
3970bf9
to
4249311
Compare
00c54e0
to
a90bd84
Compare
a90bd84
to
c6178c2
Compare
/lgtm |
/hold |
/remove-approve Sorry to be a stickler, I think we should hold on this. We are shooting for an RC tomorrow and I think we can refactor how we handle streaming here. Following PRs should help. |
pkg/epp/handlers/response.go
Outdated
if reqCtx.Streaming { | ||
logger.V(logutil.DEBUG).Info("Processing HandleResponseBody") | ||
|
||
responseText := string(body.ResponseBody.Body) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We seem to throw these stream chunks away after this loop, what if your data is split between 2 chunks? you have no guarantee that will/wont happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get what you mean. Isn't streaming intended to break down messages into different chunks? E.g.
HTTP/1.1 200 OK
date: Wed, 05 Mar 2025 22:02:33 GMT
server: uvicorn
content-type: text/event-stream; charset=utf-8
x-went-into-resp-headers: true
transfer-encoding: chunked
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":"\n","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":" Write","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":" a","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":" summary","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":" of","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":" the","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":" plot","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":".","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":"\n","logprobs":null,"finish_reason":null,"stop_reason":null}],"usage":null}
data: {"id":"cmpl-33e790a5-7f21-42a9-8d71-6531a679f4a5","object":"text_completion","created":1741212154,"model":"tweet-summary-1","choices":[{"index":0,"text":"Write","logprobs":null,"finish_reason":"length","stop_reason":null}],"usage":null}
data: [DONE]
and the for loop of Process will iterate over all chunks until the streaming is completed.
ed0c231
to
0b8f14f
Compare
0b8f14f
to
b82103b
Compare
@@ -124,7 +124,11 @@ func (s *Server) Process(srv extProcPb.ExternalProcessor_ProcessServer) error { | |||
metrics.RecordInputTokens(reqCtx.Model, reqCtx.ResolvedTargetModel, reqCtx.Response.Usage.PromptTokens) | |||
metrics.RecordOutputTokens(reqCtx.Model, reqCtx.ResolvedTargetModel, reqCtx.Response.Usage.CompletionTokens) | |||
} | |||
loggerVerbose.Info("Request context after HandleResponseBody", "context", reqCtx) | |||
if reqCtx.Streaming { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this check is needed, consider removing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to prevent logs being emitted too often. If the envoy proxy is set to STREAMED
, each response chunk will print a line of log. So a response with 1000 chunks will have 1000 lines of such information with Info level.
@@ -138,7 +142,11 @@ func (s *Server) Process(srv extProcPb.ExternalProcessor_ProcessServer) error { | |||
} | |||
} | |||
|
|||
loggerVerbose.Info("Response generated", "response", resp) | |||
if !reqCtx.Streaming { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here also
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above, this is to prevent noisy logs.
b82103b
to
d479c3d
Compare
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, JeffLuoo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
Address #178