-
Notifications
You must be signed in to change notification settings - Fork 218
InformerEventSource cannot find resource after some time #1723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So if I understand correctly, it was before. It is not the case, that it never received the resource in the informer. Checked but this part of the code is very simple on our side, basically just reading, reading the resource from the informer cache. @manusa @shawkins haven't you encountered this problem before? |
This likely was capture as fabric8io/kubernetes-client#4781 as well. We can work from the upstream side first based upon the comment over there. |
I discussed with @gyfora before, this seems to be a different issue. TBH I can't imaging how a resource is removed from the cache (ItemStore) without a delete event. But yep we can let's continue on fabric8 client side. |
It shouldn't be possible for it to have existed, then not exist without emitting a delete event - at least at the informer level. The only circumstances where an entry are removed are a delete event from the watch, and on a relist (which should be rare in an environment where bookmarks are supported) where the item no longer exists. Is it possible that the item was known / cached by the operator sdk and was never populated in the informer cache to begin with? |
No that is not possible. JOSDK reads the Informer cache for resources. There is an another layer, mapping the resource between primary custom resource and secondary resource in this case (this is where I added logging). But if that was found before, also not possible. |
I think we will need logs for this, i was not able to think about any scenarios where this could happen. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days. |
This issue was closed because it has been stalled for 14 days with no activity. |
Bug Report
What did you do?
We are using a simple label selector based informer in the Flink Kubernetes Operator: https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/utils/EventSourceUtils.java#L45
It happened in some cases that after a while, the informer could not find the target object (Deployment) anymore, while it definitely existed in Kubernetes (verified manually). Restarting the operator solved the problem.
Based on this we suspect that the informer simply stopped receiving new events after a while and never recovered.
Environment
Josdk: 4.1.1
Java 11
The text was updated successfully, but these errors were encountered: