-
Notifications
You must be signed in to change notification settings - Fork 61
added inferencemodel predicate + minor changes in logging #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,7 +26,9 @@ import ( | |
"k8s.io/client-go/tools/record" | ||
ctrl "sigs.k8s.io/controller-runtime" | ||
"sigs.k8s.io/controller-runtime/pkg/client" | ||
"sigs.k8s.io/controller-runtime/pkg/event" | ||
"sigs.k8s.io/controller-runtime/pkg/log" | ||
"sigs.k8s.io/controller-runtime/pkg/predicate" | ||
"sigs.k8s.io/gateway-api-inference-extension/api/v1alpha2" | ||
"sigs.k8s.io/gateway-api-inference-extension/pkg/epp/datastore" | ||
logutil "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/util/logging" | ||
|
@@ -41,10 +43,6 @@ type InferenceModelReconciler struct { | |
} | ||
|
||
func (c *InferenceModelReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { | ||
if req.Namespace != c.PoolNamespacedName.Namespace { | ||
return ctrl.Result{}, nil | ||
} | ||
|
||
logger := log.FromContext(ctx) | ||
loggerDefault := logger.V(logutil.DEFAULT) | ||
loggerDefault.Info("Reconciling InferenceModel", "name", req.NamespacedName) | ||
|
@@ -85,5 +83,17 @@ func (c *InferenceModelReconciler) updateDatastore(logger logr.Logger, infModel | |
func (c *InferenceModelReconciler) SetupWithManager(mgr ctrl.Manager) error { | ||
return ctrl.NewControllerManagedBy(mgr). | ||
For(&v1alpha2.InferenceModel{}). | ||
WithEventFilter(predicate.Funcs{ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am on the fence with this change. This filter is done client side, so in practice we are not significantly moving the needle by doing this at the scale of number of InferenceModels we expect in a cluster since Reconcile will be doing it anyway, and so this is repeating the same logic. But my biggest issue is that we don't have proper integration test coverage yet to test such changes, and we can't test this with unit tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
@ahg-g that is not so accurate. it might be the same logic, but it is done in a different place in the chain of calls. there are many good reasons to use the predicate. when using a predicate the event is filtered before its object is being inserted into the controller workingqueue. this is the earliest stage we can filter event if we know we're not interested in it. on the other hand, when NOT using the predicate and keeping the code as is, the object is pushed to the controller wokringqueue, which leads to unnecessary CPU and memory utilization as well as unnecessary API Server calls. using predicates can reduce the load on the API server and the controller's event watcher, which can be crucial in large clusters with many resources. in addition it might harm situations when using customized configurations of the controller, like using rate limiting. if the rate limiting is configured to have 'X' events per sec, the redundant events will reduce the performance significantly in case we filter in the Reconcile function, while if we use the predicate the object is not inserted into the workingqueue and therefore not taken into consideration when calculating rate limiting. another very good reason to use predicate is the separation of concerns. Using WithEventFilter is beneficial for keeping the Reconcile function focused on the actual business logic of handling resource state, rather than being concerned with filtering irrelevant events. there are more good reasons but to summarize - as for the other comment - of course I don't mind reverting other changes. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I looked for documentation on this but couldnt find any... And in order to corroborate any statement, can someone point to a source that verifies if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The filter that one can setup to run server side should be set via the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If that's the case, then there isn't an optimization, and we are actually using more lines of code to add complexity to understanding the reconciliation logic(A future reader could no longer read just the Reconcile func to understand the full context), right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In the current code, all Yes, with I agree it saves reconcile calls and lighten up the workqueue, but my question on whether this will actually move the needle is because how many updates/adds/deletes do we expect InferenceModel objects to have for this to make a difference (vs for example objects that get updated/created/deleted frequently, like Pods). In any case, I am not against this change, anywhere we can be more efficient and allow us to scale better is a good thing, but my main issue as I mentioned in the first comment was that we don't have proper integration test coverage yet to test such a change (specifically the delete and update paths), and we can't test it with unit tests. I know it is a small change, but I am starting to feel anxious about our low test coverage overall in this project, hence the concern. In #393 I tried to improve unit test coverage for the controllers and datastore pkgs for example. I think our integration tests need work to allow us to more easily add tests to cover paths that can't be covered with unit tests such as this one. Some of the work that I think we can do to improve integration tests is to allow in each test case to create/delete infModel and infPool objects, also Pods should be created on the api-server instead of injecting them in to the store directly. Bu I am ok to move forward with this if others don't feel strongly about the test coverage concern above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I completely agree about the tests coverage, but I think it’s not related to this PR which obviously introduces improvement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the the time element has been consistently undervalued wrt controller logic. My argument is not against this PR specifically. It's that we have limited contribution resources, and dedicating them to discuss micro-optimizations I think does not meaningfully improve the project. Yes, this is better, but by how much? Based on all the time invested, will we actually see an RoI in this? How often will the EPP's performance be dictated by this bottleneck? Could we instead have been focused elsewhere and generate more value? @nirrozenbaum , you're contributions are greatly appreciated, and I'm happy to approve this PR, I just want to see us move towards more efforts that I think are a little more worthy of all you smart folks' time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @kfswain right. I’ve done some small PRs to get familiar with the code. I definitely agree and plan to contribute in more meaningful places. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
it kinda is in the sense it adds code that is not covered in any test, I hope we can pull together here and improve the integration tests because this will allow us to do more changes with confidence :) I will leave the approval to @kfswain |
||
CreateFunc: func(e event.CreateEvent) bool { return c.eventPredicate(e.Object.(*v1alpha2.InferenceModel)) }, | ||
UpdateFunc: func(e event.UpdateEvent) bool { | ||
return c.eventPredicate(e.ObjectOld.(*v1alpha2.InferenceModel)) || c.eventPredicate(e.ObjectNew.(*v1alpha2.InferenceModel)) | ||
}, | ||
DeleteFunc: func(e event.DeleteEvent) bool { return c.eventPredicate(e.Object.(*v1alpha2.InferenceModel)) }, | ||
GenericFunc: func(e event.GenericEvent) bool { return c.eventPredicate(e.Object.(*v1alpha2.InferenceModel)) }, | ||
}). | ||
Complete(c) | ||
} | ||
|
||
func (c *InferenceModelReconciler) eventPredicate(infModel *v1alpha2.InferenceModel) bool { | ||
return (infModel.Spec.PoolRef.Name == c.PoolNamespacedName.Name) && (infModel.GetNamespace() == c.PoolNamespacedName.Namespace) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the articulation here, tyty.