-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
removing endpointsliceReconcilier and replacing with pod #268
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Record record.EventRecorder | ||
} | ||
|
||
func (c *PodReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a reconciler on pods is not necessary, the pods are already cached in the informer cache, this reconciler seems to just replicate what the informer cache does.
Calling a list using the provided selector should achieve the same effect as getting the list of pods from the datastore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another issue we will face is missed pod events (e.g., due to delayed InferencePool creation, selector updates, or ext-proc restarts), which will result in stale datastore.
Name: k8sPod.Name, | ||
Address: k8sPod.Status.PodIP + ":" + strconv.Itoa(int(inferencePool.Spec.TargetPortNumber)), | ||
} | ||
if !c.Datastore.LabelsMatch(k8sPod.ObjectMeta.Labels) || !podIsReady(k8sPod) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the we should only consider pods in the same namespace as the pool. I belie this is the default k8s service behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. The previous endpointSliceReconciler
just utilized the service that was given, and so did not need to scope to namespace, can update this should we choose to go down this path.
Address: k8sPod.Status.PodIP + ":" + strconv.Itoa(int(inferencePool.Spec.TargetPortNumber)), | ||
} | ||
if !c.Datastore.LabelsMatch(k8sPod.ObjectMeta.Labels) || !podIsReady(k8sPod) { | ||
c.Datastore.pods.Delete(pod) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I believe deleting a non-existing pod is just a noop, so it's safe to just delete here. Consider adding a comment here
if condition.Status == corev1.ConditionTrue { | ||
return true | ||
} | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break |
if !c.Datastore.LabelsMatch(k8sPod.ObjectMeta.Labels) || !podIsReady(k8sPod) { | ||
c.Datastore.pods.Delete(pod) | ||
} else { | ||
c.Datastore.pods.Store(pod, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Add some logs here
#271 supersedes this PR |
/hold |
There are a couple of issues with this approach, scalability and data staleness. Regarding scalability, this approach lists all pods in the cluster and does the filtering client-side. This we can address as a follow up, and so it is not my biggest concern. My main concern is data staleness, this approach will cause missed pod events and so will result in stale cache data. Here are some examples:
In such cases, our cache will not be fresh until the pods get an update event to trigger another reconciliation on the pods, which in most cases will not happen soon enough. There is no escape from listing to address such issues. For example, listing on InferencePool reconciliation events. As suggested in the original issue #256, we need to list. Since time is of essence for this release, I worked in parallel on the solution, where we rely on the underly informer cache and the provider lists from it directly: #271 @kfswain are you aligned with the above? |
I don't think so. Point's 1 & 3 are essentially the same event. And both require the InferencePool data, as it has the selector. So we would just requeue. Point 2 is a cache dumping event, for sure. But we won't know that happens till we reconcile on the inferencePool anyway. So we could just dump the cache in that same reconciliation loop, and list the pods using the new selector as a filter (so we aren't fetching all pods). Sure we still catch all events for all pods in the cluster but that could be YAGNI'd until we hit that wall. I won't die on this hill, but I do wonder why we didnt stick with: 790d3a2 which would have done all of this (the label selector we use to queue pods is pulled from the freshest InferencePool data). With the added benefit that this would have all been done in October. Will leave comments in the other PR. |
I agree on points 1&3, they are addressed with requeuing (missed it in the code), but that is also expensive since it constantly requeue every pod in the cluster until the inferencePool is synced, for 2 we still need to list the pods on InferencePool reconcile events. But in any case, all three cases can be handled with listing the pods on InferencePool reconcile events (i.e., we don't need to requeue if we have to list anyway on inferencePool reconciliation), and I mentioned that in the above comment: "There is no escape from listing to address such issues. For example, listing on InferencePool reconciliation events." I understand that we will list only relevant pods, but as I clarify below, my push back is about having to do all this and worry if we are handling all the cases needed to keep fresh an unnecessary intermediate cache.
The change in direction was motived by the ease of use of controller-runtime over client-go for building reconcilers by abstracting away the complexities of managing worker queues and event registration (we had discussions on controller-runtime itself later, but that is beside the point). #271 doesn't reverse that, it argues for getting rid of the pod reconciler which 790d3a2 also employs; the idea is to avoid worrying about handling pod events in favor of directly listing from the informer cache whenever we needed to do probe metrics, it gets rid of the pod cache in the datastore and hopefully will allow us to remove the other cache we currently have in the provider. Basically my argument is that this PR and 790d3a2 builds an intermediate cache that we need to worry about keeping fresh by handling pod and inferencePool events, while #271 completely avoids that and so should be more resilient. My previous experience with kube-scheduler and a number of workload controllers tells me we better avoid reconciling on pods if not needed, for both scale and resiliency.
790d3a2 has the same issues as this PR. |
the EPP is in a holding pattern until the inferencePool is synced anyway though.
The only time we can't escape the list is on a selector change, which I do not expect to be that often, and that should happen on inferencePool reconciliation, the event in which it occurs. What
It employs the same informer logic. Now we just have no separation of concerns and a substantial drop in readability. And we are using informers in addition to controller runtime, so it's a strange hybrid of clients, that, again drop our code health.
You still have to do this, and do this in #271.. But yes, there was an intermediary cache in those PRs, but that was a temporary solve because the provider.go file needed to be refactored anyway. I was making the smallest possible change without disturbing others work, and then we could have a PR to tie them all together (which is why I use our locally defined The reconciler manages the keys in the map, the metrics would populate the values for each key it finds (you do the same thing. They are on different control loops and manage different things. We have to have some sort of datastore(cache) because we need to keep the metrics, we cant scrape metrics for every request. No different than what is happening here: https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/271/files#diff-8776d043539aff8c6475caca3d5a29688d12345b9527a795bac1780b4e5dd158R54 |
I want to take a step back and summarize the design pattern I had in mind, I don't think I concretely stated it: I was thinking of the informer as a cache for the pods we care about, it is always recent, listing the pods on it is efficient and it is a drop in replacement for the cache we had in the datastore. Also, since there is a metrics refresh loop, I thought that loop will be enough to refresh the metrics cache with the latest state of the pods (ready, removed, added etc.), and so we wouldn't need to setup a pod reconciler at all or think about on what events we need to update the cache. I agree we can achieve the same result with a reconciliation on pods + a full refresh of the metrics cache on inferencePool reconciliation. I wasn't sure initially if that will cover all data staleness issues, but after this discussion, I believe it does. So this PR only misses the refresh of the cache on inference pool events, and I think that will also allow removing the requeueing. Removing the requeueing makes it possible to support multiple inferencePools in the future if we chose to. I also agree that mixing client-go/informer and controller-runtime is not a good idea, I actually thought that controller-runtime uses informer underneath, but it turns out that it doesn't. controller-runtime has a similar cache library that can be used instead though and conveniently called "Cache". I apologize for the back and forth, this has been a great learning experience! |
@kfswain and I discussed offline, and proposed to re-open this and revert the other one since this one achieves the same behavior if we list and refresh the cache on inferencePool events. We can do that after the release. One thing to be be cognizant of for the future is that it will be tricky to setup a server-side selector because we need to have the pool to know the selector at the time of initializing the controller. /reopen |
@ahg-g: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@kfswain: The following tests failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.