Release v0.3.0 · kubernetes-sigs/gateway-api-inference-extension

tl;dr

FULL_DUPLEX_STREAMED is on by default
We have helm charts published for InferencePool
Many smaller polish items resolved

What's Changed

Add the base model of the cpu vllm sample app to InferenceModel.yaml by @liu-cong in #481
Fix: Updates Docs for Sidecar Requirements by @danehans in #484
Switch default serving and health check ports for bbr by @rramkumar1 in #487
Fix: e2e test dir and manifest naming by @danehans in #488
Amend the endpoint picker protocol to support fallbacks and subsetting by @ahg-g in #445
Update Makefile for BBR to ensure all proper tags are added by @rramkumar1 in #490
Improve response handling issues. by @kfswain in #494
Add metrics for BBR extension by @rramkumar1 in #468
[Metrics] Add vLLM streaming support for metrics by @JeffLuoo in #329
added support for testing cpu example in e2e tests by @nirrozenbaum in #485
Redesign EPP Metrics Pipeline to be Model Server Agnostic by @BenjaminBraunDev in #461
Update GO version to 1.24 by @BenjaminBraunDev in #501
Fixing image build and adding image building to test runs by @kfswain in #502
Create inference model/pool objects in memory instead of reading them files by @ahg-g in #505
Refactor the integration tests setup by @ahg-g in #506
fix log line by @ahg-g in #509
update release version by @nirrozenbaum in #512
Add nil option for metric_spec to specify metrics to not be scraped. by @BenjaminBraunDev in #503
switch to using formal vllm-cpu image by @nirrozenbaum in #511
cleanup logging by @kfswain in #514
Rename ext_proc.yaml to inferencepool.yaml by @ahg-g in #515
Bump the kubernetes group with 6 updates by @dependabot in #520
Update extension-policy to match the new epp service name by @ahg-g in #522
Bump github.com/prometheus/common from 0.62.0 to 0.63.0 by @dependabot in #519
Refactor beforeSuite in integration tests by @ahg-g in #508
Split the extension policy since it is envoy specific by @ahg-g in #524
Docs: Uses tabs for quickstart model server options by @danehans in #527
Add instructions to run benchmarks by @liu-cong in #480
add helm template by @Kuromesi in #416
bump vllm-cpu image to latest by @nirrozenbaum in #530
removed hf token from cpu based example by @nirrozenbaum in #464
Bump golang.org/x/net from 0.35.0 to 0.36.0 by @dependabot in #529
Move benchmark under tools by @liu-cong in #534
fixed rbac in helm chart by @ahg-g in #531
Support full duplex streaming in body-based routing extension by @rramkumar1 in #463
Simplifying EPP-side buffer by @kfswain in #538
integration test stability improvements by @kfswain in #541
Add inferencepool chart push mechanics by @ahg-g in #540
Updated the image used for cloudbuild by @ahg-g in #542
setting gotoolchain to auto by @ahg-g in #543
Simplify body streaming for BBR by @rramkumar1 in #544
Bug fix: Initialize RequestReceivedTimestamp by @liu-cong in #539
[Metrics] Handle vLLM streaming response in streaming server by @JeffLuoo in #518
Add some more unit tests for BBR by @rramkumar1 in #545
Tag the main version of the helm chart with v0 by @ahg-g in #547
Default to streaming mode by @ahg-g in #552
Initial helm chart for bbr by @rramkumar1 in #546
Add makefile configs for bbr helm chart by @rramkumar1 in #553
Adding deprecation notice of BUFFERED mode on patch policy. by @kfswain in #560
Allow bodyless requests to passthrough EPP by @kfswain in #555
remove controller-runtime dependency from API by @kfswain in #565
Swapping out flow image by @kfswain in #562
Update boilerplate template by @kfswain in #566
Allow partial metric updates by @liu-cong in #561
Removing unsafe lib by switching to atomic.Pointer by @kfswain in #567
Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 by @dependabot in #568
Bump github.com/onsi/gomega from 1.36.2 to 1.36.3 by @dependabot in #569
Bump sigs.k8s.io/controller-runtime from 0.20.3 to 0.20.4 by @dependabot in #570
Configure the vllm deployment with best practices for startup by @smarterclayton in #550
Configure gpu-deployment.yaml to force vLLM v1 with LoRA by @smarterclayton in #573
Cleanup logging in the request scheduling path by @ahg-g in #583
minor update to Makefile by @nirrozenbaum in #588
Adding printer columns to inference model by @kfswain in #574
Add provider-specific manifests for BBR helm chart by @rramkumar1 in #585
helm-improvements by @LiorLieberman in #590
Setting zap to emit logs as JSON in the deployment. by @kfswain in #591
Updating llama 2 7b to llama 3.1 8b Instruct and adding new LoRA adapters by @kfswain in #578
Renaming resources to better mirror how names are expected to be used by @kfswain in #592
update algorithm parameters from env variables by @kaushikmitr in #580
update benchmarking guide with latest results with vllm v1 by @kaushikmitr in #559
Added provider support to InferencePool helm chart by @ahg-g in #595
make dynamic lora sidecar health check parameters configurable and force reconcile by @kaushikmitr in #605
Fix verbosity flag in BBR helm chart by @rramkumar1 in #606
Adding getting started instructions for GKE, Istio, and Kgateway by @nicolexin in #577
Add support for configuring ports in BBR helm chart by @rramkumar1 in #601
Fix label selector on the ClusterPodMonitoring object by @ahg-g in #611
Removing obsolete part of metrics guide by @robscott in #608
Allow defining a default base model in the lora syncer configuration by @kaushikmitr in #609
add namespace parameter to ClusterPodMonitoring secret reference by @ahg-g in #612
Various fixes to docs and example manifests names by @ahg-g in #613
Docs: Quickstart Fixes by @danehans in #615
Adding terminationGracePeriodSeconds to match vLLMs by @kfswain in #614
Add running request gauge metric by @JeffLuoo in #604
[Metrics] Add number of ready pods metric for inference pool by @JeffLuoo in #622
Fixes Adapter ConfigMap Name Refs by @danehans in #623
Fixes Kgateway in Quickstart Guide by @danehans in #616
Update release script by @kfswain in #625
More release updates by @kfswain in #628
Patch main/d9f737 to 0.3 release by @rramkumar1 in #639

Full Changelog: v0.2.0...v0.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.0

What's Changed

Contributors