tl;dr
- FULL_DUPLEX_STREAMED is on by default
- We have helm charts published for InferencePool
- Many smaller polish items resolved
What's Changed
- Add the base model of the cpu vllm sample app to InferenceModel.yaml by @liu-cong in #481
- Fix: Updates Docs for Sidecar Requirements by @danehans in #484
- Switch default serving and health check ports for bbr by @rramkumar1 in #487
- Fix: e2e test dir and manifest naming by @danehans in #488
- Amend the endpoint picker protocol to support fallbacks and subsetting by @ahg-g in #445
- Update Makefile for BBR to ensure all proper tags are added by @rramkumar1 in #490
- Improve response handling issues. by @kfswain in #494
- Add metrics for BBR extension by @rramkumar1 in #468
- [Metrics] Add vLLM streaming support for metrics by @JeffLuoo in #329
- added support for testing cpu example in e2e tests by @nirrozenbaum in #485
- Redesign EPP Metrics Pipeline to be Model Server Agnostic by @BenjaminBraunDev in #461
- Update GO version to 1.24 by @BenjaminBraunDev in #501
- Fixing image build and adding image building to test runs by @kfswain in #502
- Create inference model/pool objects in memory instead of reading them files by @ahg-g in #505
- Refactor the integration tests setup by @ahg-g in #506
- fix log line by @ahg-g in #509
- update release version by @nirrozenbaum in #512
- Add nil option for metric_spec to specify metrics to not be scraped. by @BenjaminBraunDev in #503
- switch to using formal vllm-cpu image by @nirrozenbaum in #511
- cleanup logging by @kfswain in #514
- Rename ext_proc.yaml to inferencepool.yaml by @ahg-g in #515
- Bump the kubernetes group with 6 updates by @dependabot in #520
- Update extension-policy to match the new epp service name by @ahg-g in #522
- Bump github.com/prometheus/common from 0.62.0 to 0.63.0 by @dependabot in #519
- Refactor beforeSuite in integration tests by @ahg-g in #508
- Split the extension policy since it is envoy specific by @ahg-g in #524
- Docs: Uses tabs for quickstart model server options by @danehans in #527
- Add instructions to run benchmarks by @liu-cong in #480
- add helm template by @Kuromesi in #416
- bump vllm-cpu image to latest by @nirrozenbaum in #530
- removed hf token from cpu based example by @nirrozenbaum in #464
- Bump golang.org/x/net from 0.35.0 to 0.36.0 by @dependabot in #529
- Move benchmark under tools by @liu-cong in #534
- fixed rbac in helm chart by @ahg-g in #531
- Support full duplex streaming in body-based routing extension by @rramkumar1 in #463
- Simplifying EPP-side buffer by @kfswain in #538
- integration test stability improvements by @kfswain in #541
- Add inferencepool chart push mechanics by @ahg-g in #540
- Updated the image used for cloudbuild by @ahg-g in #542
- setting gotoolchain to auto by @ahg-g in #543
- Simplify body streaming for BBR by @rramkumar1 in #544
- Bug fix: Initialize RequestReceivedTimestamp by @liu-cong in #539
- [Metrics] Handle vLLM streaming response in streaming server by @JeffLuoo in #518
- Add some more unit tests for BBR by @rramkumar1 in #545
- Tag the main version of the helm chart with v0 by @ahg-g in #547
- Default to streaming mode by @ahg-g in #552
- Initial helm chart for bbr by @rramkumar1 in #546
- Add makefile configs for bbr helm chart by @rramkumar1 in #553
- Adding deprecation notice of BUFFERED mode on patch policy. by @kfswain in #560
- Allow bodyless requests to passthrough EPP by @kfswain in #555
- remove controller-runtime dependency from API by @kfswain in #565
- Swapping out flow image by @kfswain in #562
- Update boilerplate template by @kfswain in #566
- Allow partial metric updates by @liu-cong in #561
- Removing unsafe lib by switching to atomic.Pointer by @kfswain in #567
- Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 by @dependabot in #568
- Bump github.com/onsi/gomega from 1.36.2 to 1.36.3 by @dependabot in #569
- Bump sigs.k8s.io/controller-runtime from 0.20.3 to 0.20.4 by @dependabot in #570
- Configure the vllm deployment with best practices for startup by @smarterclayton in #550
- Configure gpu-deployment.yaml to force vLLM v1 with LoRA by @smarterclayton in #573
- Cleanup logging in the request scheduling path by @ahg-g in #583
- minor update to Makefile by @nirrozenbaum in #588
- Adding printer columns to inference model by @kfswain in #574
- Add provider-specific manifests for BBR helm chart by @rramkumar1 in #585
- helm-improvements by @LiorLieberman in #590
- Setting zap to emit logs as JSON in the deployment. by @kfswain in #591
- Updating llama 2 7b to llama 3.1 8b Instruct and adding new LoRA adapters by @kfswain in #578
- Renaming resources to better mirror how names are expected to be used by @kfswain in #592
- update algorithm parameters from env variables by @kaushikmitr in #580
- update benchmarking guide with latest results with vllm v1 by @kaushikmitr in #559
- Added provider support to InferencePool helm chart by @ahg-g in #595
- make dynamic lora sidecar health check parameters configurable and force reconcile by @kaushikmitr in #605
- Fix verbosity flag in BBR helm chart by @rramkumar1 in #606
- Adding getting started instructions for GKE, Istio, and Kgateway by @nicolexin in #577
- Add support for configuring ports in BBR helm chart by @rramkumar1 in #601
- Fix label selector on the ClusterPodMonitoring object by @ahg-g in #611
- Removing obsolete part of metrics guide by @robscott in #608
- Allow defining a default base model in the lora syncer configuration by @kaushikmitr in #609
- add namespace parameter to ClusterPodMonitoring secret reference by @ahg-g in #612
- Various fixes to docs and example manifests names by @ahg-g in #613
- Docs: Quickstart Fixes by @danehans in #615
- Adding terminationGracePeriodSeconds to match vLLMs by @kfswain in #614
- Add running request gauge metric by @JeffLuoo in #604
- [Metrics] Add number of ready pods metric for inference pool by @JeffLuoo in #622
- Fixes Adapter ConfigMap Name Refs by @danehans in #623
- Fixes Kgateway in Quickstart Guide by @danehans in #616
- Update release script by @kfswain in #625
- More release updates by @kfswain in #628
- Patch main/d9f737 to 0.3 release by @rramkumar1 in #639
Full Changelog: v0.2.0...v0.3.0