Releases: kubernetes-sigs/gateway-api-inference-extension
v0.3.0
tl;dr
- FULL_DUPLEX_STREAMED is on by default
- We have helm charts published for InferencePool
- Many smaller polish items resolved
What's Changed
- Add the base model of the cpu vllm sample app to InferenceModel.yaml by @liu-cong in #481
- Fix: Updates Docs for Sidecar Requirements by @danehans in #484
- Switch default serving and health check ports for bbr by @rramkumar1 in #487
- Fix: e2e test dir and manifest naming by @danehans in #488
- Amend the endpoint picker protocol to support fallbacks and subsetting by @ahg-g in #445
- Update Makefile for BBR to ensure all proper tags are added by @rramkumar1 in #490
- Improve response handling issues. by @kfswain in #494
- Add metrics for BBR extension by @rramkumar1 in #468
- [Metrics] Add vLLM streaming support for metrics by @JeffLuoo in #329
- added support for testing cpu example in e2e tests by @nirrozenbaum in #485
- Redesign EPP Metrics Pipeline to be Model Server Agnostic by @BenjaminBraunDev in #461
- Update GO version to 1.24 by @BenjaminBraunDev in #501
- Fixing image build and adding image building to test runs by @kfswain in #502
- Create inference model/pool objects in memory instead of reading them files by @ahg-g in #505
- Refactor the integration tests setup by @ahg-g in #506
- fix log line by @ahg-g in #509
- update release version by @nirrozenbaum in #512
- Add nil option for metric_spec to specify metrics to not be scraped. by @BenjaminBraunDev in #503
- switch to using formal vllm-cpu image by @nirrozenbaum in #511
- cleanup logging by @kfswain in #514
- Rename ext_proc.yaml to inferencepool.yaml by @ahg-g in #515
- Bump the kubernetes group with 6 updates by @dependabot in #520
- Update extension-policy to match the new epp service name by @ahg-g in #522
- Bump github.com/prometheus/common from 0.62.0 to 0.63.0 by @dependabot in #519
- Refactor beforeSuite in integration tests by @ahg-g in #508
- Split the extension policy since it is envoy specific by @ahg-g in #524
- Docs: Uses tabs for quickstart model server options by @danehans in #527
- Add instructions to run benchmarks by @liu-cong in #480
- add helm template by @Kuromesi in #416
- bump vllm-cpu image to latest by @nirrozenbaum in #530
- removed hf token from cpu based example by @nirrozenbaum in #464
- Bump golang.org/x/net from 0.35.0 to 0.36.0 by @dependabot in #529
- Move benchmark under tools by @liu-cong in #534
- fixed rbac in helm chart by @ahg-g in #531
- Support full duplex streaming in body-based routing extension by @rramkumar1 in #463
- Simplifying EPP-side buffer by @kfswain in #538
- integration test stability improvements by @kfswain in #541
- Add inferencepool chart push mechanics by @ahg-g in #540
- Updated the image used for cloudbuild by @ahg-g in #542
- setting gotoolchain to auto by @ahg-g in #543
- Simplify body streaming for BBR by @rramkumar1 in #544
- Bug fix: Initialize RequestReceivedTimestamp by @liu-cong in #539
- [Metrics] Handle vLLM streaming response in streaming server by @JeffLuoo in #518
- Add some more unit tests for BBR by @rramkumar1 in #545
- Tag the main version of the helm chart with v0 by @ahg-g in #547
- Default to streaming mode by @ahg-g in #552
- Initial helm chart for bbr by @rramkumar1 in #546
- Add makefile configs for bbr helm chart by @rramkumar1 in #553
- Adding deprecation notice of BUFFERED mode on patch policy. by @kfswain in #560
- Allow bodyless requests to passthrough EPP by @kfswain in #555
- remove controller-runtime dependency from API by @kfswain in #565
- Swapping out flow image by @kfswain in #562
- Update boilerplate template by @kfswain in #566
- Allow partial metric updates by @liu-cong in #561
- Removing unsafe lib by switching to atomic.Pointer by @kfswain in #567
- Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 by @dependabot in #568
- Bump github.com/onsi/gomega from 1.36.2 to 1.36.3 by @dependabot in #569
- Bump sigs.k8s.io/controller-runtime from 0.20.3 to 0.20.4 by @dependabot in #570
- Configure the vllm deployment with best practices for startup by @smarterclayton in #550
- Configure gpu-deployment.yaml to force vLLM v1 with LoRA by @smarterclayton in #573
- Cleanup logging in the request scheduling path by @ahg-g in #583
- minor update to Makefile by @nirrozenbaum in #588
- Adding printer columns to inference model by @kfswain in #574
- Add provider-specific manifests for BBR helm chart by @rramkumar1 in #585
- helm-improvements by @LiorLieberman in #590
- Setting zap to emit logs as JSON in the deployment. by @kfswain in #591
- Updating llama 2 7b to llama 3.1 8b Instruct and adding new LoRA adapters by @kfswain in #578
- Renaming resources to better mirror how names are expected to be used by @kfswain in #592
- update algorithm parameters from env variables by @kaushikmitr in #580
- update benchmarking guide with latest results wi...
v0.3.0-rc.1
What's Changed
- Add the base model of the cpu vllm sample app to InferenceModel.yaml by @liu-cong in #481
- Fix: Updates Docs for Sidecar Requirements by @danehans in #484
- Switch default serving and health check ports for bbr by @rramkumar1 in #487
- Fix: e2e test dir and manifest naming by @danehans in #488
- Amend the endpoint picker protocol to support fallbacks and subsetting by @ahg-g in #445
- Update Makefile for BBR to ensure all proper tags are added by @rramkumar1 in #490
- Improve response handling issues. by @kfswain in #494
- Add metrics for BBR extension by @rramkumar1 in #468
- [Metrics] Add vLLM streaming support for metrics by @JeffLuoo in #329
- added support for testing cpu example in e2e tests by @nirrozenbaum in #485
- Redesign EPP Metrics Pipeline to be Model Server Agnostic by @BenjaminBraunDev in #461
- Update GO version to 1.24 by @BenjaminBraunDev in #501
- Fixing image build and adding image building to test runs by @kfswain in #502
- Create inference model/pool objects in memory instead of reading them files by @ahg-g in #505
- Refactor the integration tests setup by @ahg-g in #506
- fix log line by @ahg-g in #509
- update release version by @nirrozenbaum in #512
- Add nil option for metric_spec to specify metrics to not be scraped. by @BenjaminBraunDev in #503
- switch to using formal vllm-cpu image by @nirrozenbaum in #511
- cleanup logging by @kfswain in #514
- Rename ext_proc.yaml to inferencepool.yaml by @ahg-g in #515
- Bump the kubernetes group with 6 updates by @dependabot in #520
- Update extension-policy to match the new epp service name by @ahg-g in #522
- Bump github.com/prometheus/common from 0.62.0 to 0.63.0 by @dependabot in #519
- Refactor beforeSuite in integration tests by @ahg-g in #508
- Split the extension policy since it is envoy specific by @ahg-g in #524
- Docs: Uses tabs for quickstart model server options by @danehans in #527
- Add instructions to run benchmarks by @liu-cong in #480
- add helm template by @Kuromesi in #416
- bump vllm-cpu image to latest by @nirrozenbaum in #530
- removed hf token from cpu based example by @nirrozenbaum in #464
- Bump golang.org/x/net from 0.35.0 to 0.36.0 by @dependabot in #529
- Move benchmark under tools by @liu-cong in #534
- fixed rbac in helm chart by @ahg-g in #531
- Support full duplex streaming in body-based routing extension by @rramkumar1 in #463
- Simplifying EPP-side buffer by @kfswain in #538
- integration test stability improvements by @kfswain in #541
- Add inferencepool chart push mechanics by @ahg-g in #540
- Updated the image used for cloudbuild by @ahg-g in #542
- setting gotoolchain to auto by @ahg-g in #543
- Simplify body streaming for BBR by @rramkumar1 in #544
- Bug fix: Initialize RequestReceivedTimestamp by @liu-cong in #539
- [Metrics] Handle vLLM streaming response in streaming server by @JeffLuoo in #518
- Add some more unit tests for BBR by @rramkumar1 in #545
- Tag the main version of the helm chart with v0 by @ahg-g in #547
- Default to streaming mode by @ahg-g in #552
- Initial helm chart for bbr by @rramkumar1 in #546
- Add makefile configs for bbr helm chart by @rramkumar1 in #553
- Adding deprecation notice of BUFFERED mode on patch policy. by @kfswain in #560
- Allow bodyless requests to passthrough EPP by @kfswain in #555
- remove controller-runtime dependency from API by @kfswain in #565
- Swapping out flow image by @kfswain in #562
- Update boilerplate template by @kfswain in #566
- Allow partial metric updates by @liu-cong in #561
- Removing unsafe lib by switching to atomic.Pointer by @kfswain in #567
- Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 by @dependabot in #568
- Bump github.com/onsi/gomega from 1.36.2 to 1.36.3 by @dependabot in #569
- Bump sigs.k8s.io/controller-runtime from 0.20.3 to 0.20.4 by @dependabot in #570
- Configure the vllm deployment with best practices for startup by @smarterclayton in #550
- Configure gpu-deployment.yaml to force vLLM v1 with LoRA by @smarterclayton in #573
- Cleanup logging in the request scheduling path by @ahg-g in #583
- minor update to Makefile by @nirrozenbaum in #588
- Adding printer columns to inference model by @kfswain in #574
- Add provider-specific manifests for BBR helm chart by @rramkumar1 in #585
- helm-improvements by @LiorLieberman in #590
- Setting zap to emit logs as JSON in the deployment. by @kfswain in #591
- Updating llama 2 7b to llama 3.1 8b Instruct and adding new LoRA adapters by @kfswain in #578
- Renaming resources to better mirror how names are expected to be used by @kfswain in #592
- update algorithm parameters from env variables by @kaushikmitr in #580
- update benchmarking guide with latest results with vllm v1 by @kaushikmitr in #559
- Added provider support to InferencePool helm chart by @ahg-g in #595
- make dynamic lora sidecar health check parameters configurable and for...
v0.2.0
What's Changed
- Revert "Replace EndpointSlice reconciler with pod list backed by informers" by @kfswain in #301
- Fixing small linter complaints by @kfswain in #302
- In hermetic test, add additional test cases and move k8sClient object creation so it's called once for all tests by @BenjaminBraunDev in #278
- [Metrics] Add average kv cache and waiting queue size metrics for inference pool by @JeffLuoo in #304
- Move getting started guide to docs site by @kfswain in #308
- site-source: Fix 'Bakcground' misspell in API concepts page by @timflannagan in #309
- Mkdocs fixes by @kfswain in #314
- Bump google.golang.org/protobuf from 1.36.4 to 1.36.5 by @dependabot in #315
- Remove gci linter by @ahg-g in #317
- fix: adds ErrorNotFound Handling for InferenceModel Reconciler by @danehans in #286
- site-src: Replace k8sgateway with kgateway & fix spelling in roles-and-personas.md by @timflannagan in #311
- Fix: Go Mod Imports by @danehans in #318
- Updates EPP Deployment and Release Doc/Script by @danehans in #322
- Delete InferenceModels from the datastore when deletionTimestamp is set by @ahg-g in #319
- Actually init logging using Zap by @tchap in #267
- Remove fatal log calls in executable code by @tchap in #265
- feat: Adds e2e test script by @danehans in #294
- Replacing endpointSlice Reconciler with a direct Pod Reconciler by @kfswain in #300
- Move manager from runserver to main by @tchap in #331
- feat: adds image-load and kind-load Make targets by @danehans in #288
- Use structured logging by @tchap in #330
- Add TLS support with self-signed certificate. by @ahg-g in #335
- Lora syncer docs by @coolkp in #320
- Fix cloudbuild rule for the LoRA syncer image by @ahg-g in #339
- fix: Corrects release branch naming by @danehans in #333
- Use contextual logging by @tchap in #337
- Bump the kubernetes group with 6 updates by @dependabot in #351
- Bump sigs.k8s.io/controller-runtime from 0.20.1 to 0.20.2 by @dependabot in #352
- Fixes to the adapter rollouts guide by @ahg-g in #338
- Consolidating all storage behind datastore by @ahg-g in #350
- fixed a typo - close a bash markdown by @nirrozenbaum in #364
- Added controller and datastore package by @hzxuzhonghu in #363
- Move pkg/ext-proc -> cmd/ext-proc by @tchap in #368
- added license header to all .go files by @nirrozenbaum in #370
- fix inference extension not correctly scrape pod metrics by @Kuromesi in #366
- Move pkg/manifests -> config/manifests by @tchap in #371
- [Metrics] Add request error metrics by @JeffLuoo in #269
- Rename pkg/ext-proc to pkg/epp by @tchap in #372
- Move pkg/ext-proc/metrics/README.md -> site-src/guides/metrics.md by @courageJ in #373
- Defining an outer metadata struct as part of the extproc endpoint picking protocol by @ahg-g in #377
- Draft a revised README.md by @smarterclayton in #374
- Add README.md file to the epp pkg by @ahg-g in #386
- Split the proxy and model server protocols for easy reference by @ahg-g in #387
- [Metric] Add inference pool and request error metrics to the dashboard by @JeffLuoo in #389
- Switch to gcr.io/distroless/static:nonroot base image by @ahg-g in #384
- fix context canceled recv error handling by @Kuromesi in #390
- Added endpoint picker diagram by @ahg-g in #396
- Added v1alpha2 api by @hzxuzhonghu in #398
- Adding a roadmap to README by @kfswain in #400
- Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.0 by @dependabot in #402
- Bump github.com/google/go-cmp from 0.6.0 to 0.7.0 by @dependabot in #403
- updated logging in inferencepool reconciler by @nirrozenbaum in #399
- added inferencemodel predicate + minor changes in logging by @nirrozenbaum in #397
- Syncing getting started guide all to main by @kfswain in #410
- fixed typo in filepath in website guide page by @nirrozenbaum in #412
- Fix InferenceModel deletion logic by @ahg-g in #393
- Updated yamls to use v1alpha2 by @ahg-g in #420
- Rm v1alpha1 api by @hzxuzhonghu in #405
- removed the EndpointPickerNotHealthy condition form pool status by @ahg-g in #421
- [Metrics] Add metrics validation in integration test by @JeffLuoo in #413
- predicate follow up PR to remove the check from Reconcile func by @nirrozenbaum in #418
- Mis cleanup by @hzxuzhonghu in #428
- fix metric scrape port not updated when inference pool target port updated by @Kuromesi in #417
- make ModelName immutable and fix model weight by @hzxuzhonghu in #427
- Consistent validation for reference types by @robscott in #430
- create pods during integration tests by @Kuromesi in #431
- fix typos by @nirrozenbaum in #433
- Adding Accepted and ResolvedRefs conditions to InferencePool by @robscott in #446
- Add code for Envoy extension that supports body-to-header translation by @rramkumar1 in #355
- Add Makefile + cloudbuild configs for body-based routing extension by @rramkumar1 in #442
- added cpu based example by @nirrozenbaum in #436
- upda...
v0.2.0-rc
What's Changed
- Revert "Replace EndpointSlice reconciler with pod list backed by informers" by @kfswain in #301
- Fixing small linter complaints by @kfswain in #302
- In hermetic test, add additional test cases and move k8sClient object creation so it's called once for all tests by @BenjaminBraunDev in #278
- [Metrics] Add average kv cache and waiting queue size metrics for inference pool by @JeffLuoo in #304
- Move getting started guide to docs site by @kfswain in #308
- site-source: Fix 'Bakcground' misspell in API concepts page by @timflannagan in #309
- Mkdocs fixes by @kfswain in #314
- Bump google.golang.org/protobuf from 1.36.4 to 1.36.5 by @dependabot in #315
- Remove gci linter by @ahg-g in #317
- fix: adds ErrorNotFound Handling for InferenceModel Reconciler by @danehans in #286
- site-src: Replace k8sgateway with kgateway & fix spelling in roles-and-personas.md by @timflannagan in #311
- Fix: Go Mod Imports by @danehans in #318
- Updates EPP Deployment and Release Doc/Script by @danehans in #322
- Delete InferenceModels from the datastore when deletionTimestamp is set by @ahg-g in #319
- Actually init logging using Zap by @tchap in #267
- Remove fatal log calls in executable code by @tchap in #265
- feat: Adds e2e test script by @danehans in #294
- Replacing endpointSlice Reconciler with a direct Pod Reconciler by @kfswain in #300
- Move manager from runserver to main by @tchap in #331
- feat: adds image-load and kind-load Make targets by @danehans in #288
- Use structured logging by @tchap in #330
- Add TLS support with self-signed certificate. by @ahg-g in #335
- Lora syncer docs by @coolkp in #320
- Fix cloudbuild rule for the LoRA syncer image by @ahg-g in #339
- fix: Corrects release branch naming by @danehans in #333
- Use contextual logging by @tchap in #337
- Bump the kubernetes group with 6 updates by @dependabot in #351
- Bump sigs.k8s.io/controller-runtime from 0.20.1 to 0.20.2 by @dependabot in #352
- Fixes to the adapter rollouts guide by @ahg-g in #338
- Consolidating all storage behind datastore by @ahg-g in #350
- fixed a typo - close a bash markdown by @nirrozenbaum in #364
- Added controller and datastore package by @hzxuzhonghu in #363
- Move pkg/ext-proc -> cmd/ext-proc by @tchap in #368
- added license header to all .go files by @nirrozenbaum in #370
- fix inference extension not correctly scrape pod metrics by @Kuromesi in #366
- Move pkg/manifests -> config/manifests by @tchap in #371
- [Metrics] Add request error metrics by @JeffLuoo in #269
- Rename pkg/ext-proc to pkg/epp by @tchap in #372
- Move pkg/ext-proc/metrics/README.md -> site-src/guides/metrics.md by @courageJ in #373
- Defining an outer metadata struct as part of the extproc endpoint picking protocol by @ahg-g in #377
- Draft a revised README.md by @smarterclayton in #374
- Add README.md file to the epp pkg by @ahg-g in #386
- Split the proxy and model server protocols for easy reference by @ahg-g in #387
- [Metric] Add inference pool and request error metrics to the dashboard by @JeffLuoo in #389
- Switch to gcr.io/distroless/static:nonroot base image by @ahg-g in #384
- fix context canceled recv error handling by @Kuromesi in #390
- Added endpoint picker diagram by @ahg-g in #396
- Added v1alpha2 api by @hzxuzhonghu in #398
- Adding a roadmap to README by @kfswain in #400
- Bump github.com/prometheus/client_golang from 1.20.5 to 1.21.0 by @dependabot in #402
- Bump github.com/google/go-cmp from 0.6.0 to 0.7.0 by @dependabot in #403
- updated logging in inferencepool reconciler by @nirrozenbaum in #399
- added inferencemodel predicate + minor changes in logging by @nirrozenbaum in #397
- Syncing getting started guide all to main by @kfswain in #410
- fixed typo in filepath in website guide page by @nirrozenbaum in #412
- Fix InferenceModel deletion logic by @ahg-g in #393
- Updated yamls to use v1alpha2 by @ahg-g in #420
- Rm v1alpha1 api by @hzxuzhonghu in #405
- removed the EndpointPickerNotHealthy condition form pool status by @ahg-g in #421
- [Metrics] Add metrics validation in integration test by @JeffLuoo in #413
- predicate follow up PR to remove the check from Reconcile func by @nirrozenbaum in #418
- Mis cleanup by @hzxuzhonghu in #428
- fix metric scrape port not updated when inference pool target port updated by @Kuromesi in #417
- make ModelName immutable and fix model weight by @hzxuzhonghu in #427
- Consistent validation for reference types by @robscott in #430
- create pods during integration tests by @Kuromesi in #431
- fix typos by @nirrozenbaum in #433
- Adding Accepted and ResolvedRefs conditions to InferencePool by @robscott in #446
- Add code for Envoy extension that supports body-to-header translation by @rramkumar1 in #355
- Add Makefile + cloudbuild configs for body-based routing extension by @rramkumar1 in #442
- added cpu based example by @nirrozenbaum in #436
- upda...
v0.1.0
API version: v1alpha1
We are excited to announce the v0.1.0 release of the Kubernetes Gateway API Inference Extension. This release is intended for early adopters and the community to begin integrating and testing the new APIs.
Thank you to all the contributors for helping us deliver this release and for shaping the future of this project!
Getting Started
If you'd like to jump right in, head here!
What we support
GIE v0.1.0 was developed on:
- vLLM v0.7.1
- Envoy Gateway v1.2.1(or higher)
- k8s v1.31
With more model servers and gateway implementations coming soon!
Note: Model servers seeking to support GIE should implement our model server protocol here. Any feedback on the protocol or adoption process is very welcomed!
Note: v0.1.0 was necessary to enable Gateways to begin adopting this tooling. Any Gateway implementation that supports ext-proc & the Gateway API will be able to support GIE.
Disclaimers
- Not for Production: This release candidate is provided solely for evaluation, testing, and feedback. We advise against using it in production or building products on top of it, as there may be breaking changes before the final release.
- Feedback Welcome: Your experiences and feedback are invaluable. Please share any issues or suggestions via GitHub Issues to help us improve the project.
What's Changed
- Owners addition by @kfswain in #2
- proposed repo structure + copy of initial proposal by @kfswain in #1
- Repo structure by @kfswain in #3
- Update OWNERS by @smarterclayton in #6
- PoC implementation by @kfswain in #4
- Fix build for ext-proc example by @terrytangyuan in #7
- Simplify POC installation by @liu-cong in #8
- docs: poc markdown improvements by @Xunzhuo in #9
- fix: inconsistent secret key with deployment by @Xunzhuo in #11
- Updating top level README by @kfswain in #13
- API Proposal by @kfswain in #5
- Add initial ext proc implementation with LoRA affinity by @liu-cong in #14
- Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug by @liu-cong in #17
- Envoy update by @kfswain in #18
- CRD implementation by @kfswain in #20
- Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing by @liu-cong in #26
- Add priority based scheduling by @liu-cong in #25
- Update vllm deployment example to use 1 GPU as tensor parallelism is 1 by @liu-cong in #28
- Add a hermetic e2e test with fake backend pods by @liu-cong in #29
- Fix mutierr appending; add a unit test. by @liu-cong in #33
- Some minor fixes in Envoy setup by @liu-cong in #35
- Update targetModel in request body by @liu-cong in #37
- Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. by @kfswain in #39
- Simulation code for llm inference gateway by @kaushikmitr in #15
- Add myself to approvers by @kfswain in #42
- Dynamic lora load/unload sidecar by @coolkp in #31
- LLMServerPool Implementation by @kfswain in #36
- Repo cleanup by @kfswain in #46
- Updating API and generating code by @kfswain in #47
- Do not fail Init if fetch metrics fails. It can recover gracefully. by @liu-cong in #51
- llmservice reconciler implementation by @kfswain in #48
- Update README.md by @BenTheElder in #52
- Fixing hermetic_test, small formatting changes by @kfswain in #53
- Add myself to reviewers by @liu-cong in #40
- Add dependency updates by @robert-cronin in #57
- Bump the kubernetes group with 4 updates by @dependabot in #58
- Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 by @dependabot in #61
- Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 by @dependabot in #62
- Bump github.com/prometheus/common from 0.55.0 to 0.60.1 by @dependabot in #60
- Bump google.golang.org/grpc from 1.65.0 to 1.68.0 by @dependabot in #59
- Fixing Groupversion by @kfswain in #63
- Integrating LLMService with weight splitting by @kfswain in #64
- Fix build and test by @liu-cong in #65
- Makefile fixes with generated output by @kfswain in #67
- Manifest updates by @kaushikmitr in #81
- Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates by @kaushikmitr in #78
- Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized by @liu-cong in #79
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 by @dependabot in #82
- Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 by @dependabot in #83
- Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 by @dependabot in #86
- Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 by @dependabot in #84
- Bump github.com/prometheus/common from 0.60.1 to 0.61.0 by @dependabot in #85
- Proposal update for the API names and latency objective by @ahg-g in #91
- Adding simple cloudbuild file that builds, tags, and pushes the docker image by @kfswain in #94
- switch to using upstream vllm with new metric by @coolkp in #54
- Updating cloudbuild to have image name by @kfswain in #106
- Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in #105
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 by @dependabot in #102
- Bump google.golang.org/grpc from 1.68.0 to 1.69.0 by @dependabot in #103
- Bump the kubernetes group with 4 updates by @dependabot in https...
v0.1.0-rc.1
API version: v1alpha1
We are excited to announce the v0.1.0-rc.1
release candidate of the Kubernetes Gateway API Inference Extension. This release is intended for early adopters and the community to begin integrating and testing the new APIs. Please note the following:
- Not for Production: This release candidate is provided solely for evaluation, testing, and feedback. We strongly advise against using it in production or building products on top of it, as there may be breaking changes before the final release.
- Feedback Welcome: Your experiences and feedback are invaluable. Please share any issues or suggestions via GitHub Issues to help us improve the project.
Thank you to all the contributors for helping us deliver this release and for shaping the future of this project!
What's Changed
- Owners addition by @kfswain in #2
- proposed repo structure + copy of initial proposal by @kfswain in #1
- Repo structure by @kfswain in #3
- Update OWNERS by @smarterclayton in #6
- PoC implementation by @kfswain in #4
- Fix build for ext-proc example by @terrytangyuan in #7
- Simplify POC installation by @liu-cong in #8
- docs: poc markdown improvements by @Xunzhuo in #9
- fix: inconsistent secret key with deployment by @Xunzhuo in #11
- Updating top level README by @kfswain in #13
- API Proposal by @kfswain in #5
- Add initial ext proc implementation with LoRA affinity by @liu-cong in #14
- Improve the filter to return multiple preferred pods instead of one; also fix metrics update bug by @liu-cong in #17
- Envoy update by @kfswain in #18
- CRD implementation by @kfswain in #20
- Refactor: Define PodMetricsClient interface and hide implementation details of vllm metrics processing by @liu-cong in #26
- Add priority based scheduling by @liu-cong in #25
- Update vllm deployment example to use 1 GPU as tensor parallelism is 1 by @liu-cong in #28
- Add a hermetic e2e test with fake backend pods by @liu-cong in #29
- Fix mutierr appending; add a unit test. by @liu-cong in #33
- Some minor fixes in Envoy setup by @liu-cong in #35
- Update targetModel in request body by @liu-cong in #37
- Adding circuit breaker and timeout layers to avoid Gateway 5xx errors. by @kfswain in #39
- Simulation code for llm inference gateway by @kaushikmitr in #15
- Add myself to approvers by @kfswain in #42
- Dynamic lora load/unload sidecar by @coolkp in #31
- LLMServerPool Implementation by @kfswain in #36
- Repo cleanup by @kfswain in #46
- Updating API and generating code by @kfswain in #47
- Do not fail Init if fetch metrics fails. It can recover gracefully. by @liu-cong in #51
- llmservice reconciler implementation by @kfswain in #48
- Update README.md by @BenTheElder in #52
- Fixing hermetic_test, small formatting changes by @kfswain in #53
- Add myself to reviewers by @liu-cong in #40
- Add dependency updates by @robert-cronin in #57
- Bump the kubernetes group with 4 updates by @dependabot in #58
- Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.22.0 by @dependabot in #61
- Bump github.com/onsi/gomega from 1.33.1 to 1.36.0 by @dependabot in #62
- Bump github.com/prometheus/common from 0.55.0 to 0.60.1 by @dependabot in #60
- Bump google.golang.org/grpc from 1.65.0 to 1.68.0 by @dependabot in #59
- Fixing Groupversion by @kfswain in #63
- Integrating LLMService with weight splitting by @kfswain in #64
- Fix build and test by @liu-cong in #65
- Makefile fixes with generated output by @kfswain in #67
- Manifest updates by @kaushikmitr in #81
- Enhancements to LLM Instance Gateway: Scheduling Logic, and Documentation Updates by @kaushikmitr in #78
- Bug fixes: 1. NPE when model is not found 2. Port is considered 0 when LLMServerPool is not initialized by @liu-cong in #79
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.1 to 4.4.3 by @dependabot in #82
- Bump google.golang.org/protobuf from 1.35.1 to 1.35.2 by @dependabot in #83
- Bump github.com/envoyproxy/go-control-plane from 0.13.0 to 0.13.1 by @dependabot in #86
- Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.19.3 by @dependabot in #84
- Bump github.com/prometheus/common from 0.60.1 to 0.61.0 by @dependabot in #85
- Proposal update for the API names and latency objective by @ahg-g in #91
- Adding simple cloudbuild file that builds, tags, and pushes the docker image by @kfswain in #94
- switch to using upstream vllm with new metric by @coolkp in #54
- Updating cloudbuild to have image name by @kfswain in #106
- Bump github.com/onsi/gomega from 1.36.0 to 1.36.1 by @dependabot in #105
- Bump sigs.k8s.io/structured-merge-diff/v4 from 4.4.3 to 4.5.0 by @dependabot in #102
- Bump google.golang.org/grpc from 1.68.0 to 1.69.0 by @dependabot in #103
- Bump the kubernetes group with 4 updates by @dependabot in #101
- Bump google.golang.org/protobuf from 1.35.2 to 1.36.0 by @dependabot in #104
- Change from SIG Apps to SIG Network by @terrytangyuan in #92
- Add response body handler by @liu-cong in #90
- API Shift/Refactor by @kfswain in #93
- API compliance fix and build fixes by @kfswain in #114
- Added a verify rule to Makefile by @ahg-g in #122
- update the linter version by @ahg-g in https://github.com/kubernetes-sigs/gatew...