Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use update instead of replace in DR #2006

Merged
merged 2 commits into from
Aug 14, 2023
Merged

Conversation

csviri
Copy link
Collaborator

@csviri csviri commented Aug 8, 2023

No description provided.

@csviri csviri requested a review from metacosm August 8, 2023 13:33
@csviri csviri self-assigned this Aug 8, 2023
@openshift-ci openshift-ci bot requested review from adam-sandor and andreaTP August 8, 2023 13:33
@csviri
Copy link
Collaborator Author

csviri commented Aug 8, 2023

cc @shawkins

Copy link
Collaborator

@shawkins shawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the SSA case you are explicitly setting the resourceVersion. Are you expecting the user to have set the resourceVersion? If not, then this will be unlocked if the resourceVersion is null.

@csviri
Copy link
Collaborator Author

csviri commented Aug 8, 2023

In the SSA case you are explicitly setting the resourceVersion. Are you expecting the user to have set the resourceVersion? If not, then this will be unlocked if the resourceVersion is null.

I don't think the resource version can be null, since we here update a cloned actual version (which already has resource version).

@shawkins
Copy link
Collaborator

shawkins commented Aug 8, 2023

I don't think the resource version can be null, since we here update a cloned actual version (which already has resource version).

Are saying that the user should know to leave the resourceVersion populated on the target they create from the desired method? And that if they set it as null, that it will be an unlocked update? Shouldn't SSA work the same way?

My guess was that you are trying to force locking, so I'm proposing https://github.com/operator-framework/java-operator-sdk/pull/2005/files#diff-aa20588ab4b1ff4f171a897d4042d1b055c02737b067d09aaeb9cfd770adf3a0R131

@csviri
Copy link
Collaborator Author

csviri commented Aug 8, 2023

https://github.com/java-operator-sdk/java-operator-sdk/blob/b99a8b7c32001c3aaeca740709472241d3918605/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/dependent/kubernetes/updatermatcher/GenericResourceUpdaterMatcher.java#L42-L47

In generic case, here the logic works that we clone the actual resource (from cache), replacing the spec and annotations ans labels. So the resource version is always present in the result.

@csviri
Copy link
Collaborator Author

csviri commented Aug 8, 2023

Shouldn't SSA work the same way?

no since that is not aware of the current state. Normally there should not be optimistick locking for SSA, just do it because of event recording, althoug will check but we might able to live without that too.

@csviri csviri requested a review from shawkins August 8, 2023 14:00
@metacosm
Copy link
Collaborator

metacosm commented Aug 8, 2023

Which issue(s) is this supposed to address? Is this about patching vs. sending a full version of the resource we're trying to update?

@shawkins
Copy link
Collaborator

shawkins commented Aug 8, 2023

In generic case, here the logic works that we clone the actual resource (from cache), replacing the spec and annotations ans labels. So the resource version is always present in the result.

Sorry I hadn't realized there was that additional complexity. So basically every resource that lacks a spec, or ones that you want to manipulate a subresource, you have create a matcher for. And if new mutable fields are added, they must also be added to the matcher.

no since that is not aware of the current state. Normally there should not be optimistick locking for SSA, just do it because of event recording, althoug will check but we might able to live without that too.

That's not what I was thinking, my comment was based upon the possiblity that the resourceVersion in the non-SSA case could be null.

@csviri
Copy link
Collaborator Author

csviri commented Aug 8, 2023

Which issue(s) is this supposed to address? Is this about patching vs. sending a full version of the resource we're trying to update?

This is about sending the full update. (non SSA). Without this having under optimistick locking, this part becomes very fuzzy:

https://github.com/java-operator-sdk/java-operator-sdk/blob/b99a8b7c32001c3aaeca740709472241d3918605/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/processing/event/source/informer/InformerEventSource.java#L283-L288

Since it could happen that there was an other update happening, from other party, and we would simply override it. Also makes the eventing easier to reason about, but that might be not necessariy. Will create a separate issue to discuss that situation.

@@ -154,7 +154,7 @@ public R update(R actual, R target, P primary, Context<P> context) {
.forceConflicts().serverSideApply();
} else {
var updatedActual = updaterMatcher.updateResource(actual, target, context);
updatedResource = prepare(updatedActual, primary, "Updating").replace();
updatedResource = prepare(updatedActual, primary, "Updating").update();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may need to be an explicitly locked replace, or a patch. One subtle difference between update and replace is that replace does some modifications to the resource (Services, Jobs, and OpenShift RoleBindings) based upon the present state - see HasMetadataOperation.modifyItemForReplaceOrPatch. The intention is to remove that once replace is gone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not makes up to me, we such changes are done for the resources:

https://github.com/fabric8io/kubernetes-client/blob/0c7d5150702387c1aeca66facb98508d590934f2/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/batch/v1/JobOperationsImpl.java#L163-L175

Shoudn't be this the responsibility of the user to fill those values?

I don't see why should be this patch or replace instead of update because of this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not clear why these resources have a special treatment to me either…

Copy link
Collaborator

@shawkins shawkins Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root issue is that PUT has side effects. Service is the poster child for this - if you attempt a PUT and the clusterIP is not populated, then it will be allocated, which will then conflict with the existing one. If you use an empty string it will complain that the field is immutable - people have complained about this for years kubernetes/kubernetes#91459 So I guess that in the past they wanted to smooth this behavior out in the fabric8 client.

In the last couple of years when users complain of new situations like this that don't work with createOrReplace we have been telling them to manually do something like the proposed createOr, or more recently to use serverSideApply.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, things can get messy fast when you get into discussion of HTTP verbs semantics :)
I do agree with one of the commenters that PUT should be idempotent so regardless of what controllers do, if they accepted one resource as valid at one point in time, they should accept that same resource again if re-PUT (and possibly return the existing one), which doesn't appear to be the case here…

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root issue is that PUT has side effects. Service is the poster child for this - if you attempt a PUT and the clusterIP is not populated, then it will be allocated, which will then conflict with the existing one. If you use an empty string it will complain that the field is immutable - people have complained about this for years kubernetes/kubernetes#91459 So I guess that in the past they wanted to smooth this behavior out in the fabric8 client.
In the last couple of years when users complain of new situations like this that don't work with createOrReplace we have been telling them to manually do something like the proposed createOr, or more recently to use serverSideApply.

Yeah, I think this workaround for example can nice used:
First load the existing service that contains the current clusterIP. Set the old clusterIp to the updated V1Service.

For these special cases would rather prepare some default implementations in dependent resources, rather than solving it on client level here. So would anyways stick with the update.

@csviri csviri merged commit 55bc16c into main Aug 14, 2023
@csviri csviri deleted the dr-fix-update-no-replace branch August 14, 2023 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants