Slow reconciliations with JOSDK 5.0.3 release and failed reconciliation count in JOSDK metrics #2709

afalhambra-hivemq · 2025-02-27T10:04:04Z

Bug Report

What did you do?

We already migrated to the 5.0.2 JOSDK release with no issues but since the 5.0.3 release we are hitting some performance issues and degradation with slow reconciliations (greater than 1 second) on a rolling restart, along with some unexpected error metrics from the JOSDK:

operator_sdk_reconciliations_failed_total{exception="KubernetesClientException",group="hivemq.com",kind="HiveMQPlatform",name="test-platform",namespace="customextensionwithsubpathinstallationit",scope="namespace",version="v1"} 1.0

What did you expect to see?

No slow reconciliation (greater than 1 second) are happening and no error metrics are displayed if the reconciliation is done successfully.

What did you see instead? Under which circumstances?

Slow reconciliations (greater than 1 second) and unexpected error count metrics with same KubernetesClientException when reconciling on a rolling restart.

Environment

Kubernetes cluster type:

K3S

$ Mention java-operator-sdk version from pom.xml file

5.0.3

$ java -version

openjdk 21.0.3 2024-04-16 LTS

$ kubectl version

Client Version: v1.32.2
Kustomize Version: v5.5.0
Server Version: v1.32.0

Possible Solution

Additional context

The text was updated successfully, but these errors were encountered:

metacosm · 2025-02-27T10:16:38Z

Would you happen to have the associated stacktraces?

afalhambra-hivemq · 2025-02-27T10:20:55Z

Would you happen to have the associated stacktraces?

I mentioned in the ticket, there is no error or stacktrace available in the logs as all the reconciliation loops go fine with no further issue. It may be a inner error generated by the JOSDK or the Fabric8 client?

Will it be useful if I increase the log level of the JOSDK to DEBUG and attached the log?

csviri · 2025-02-27T11:06:03Z

@afalhambra-hivemq yes, pls turn on the debug level logs, see if there is something useful

csviri · 2025-02-27T11:10:59Z

Also is this operator opensource?

csviri · 2025-02-27T11:21:16Z

It is quite strange, effectively only this changed:
https://github.com/operator-framework/java-operator-sdk/pull/2696/files
the the only difference is that we clone the resource before we doing the patch update.

metacosm · 2025-02-27T11:36:53Z

It is quite strange, effectively only this changed: https://github.com/operator-framework/java-operator-sdk/pull/2696/files the the only difference is that we clone the resource before we doing the patch update.

well, cloning is a potentially costly operation so it's not that big of a surprise…

afalhambra-hivemq · 2025-02-27T11:49:25Z

This is the only stacktrace exception below I noticed when setting log level to DEBUG.

11:20:19.705 [ReconcilerExecutor-hivemq-controller-524] INFO  c.h.p.o.h.HiveMQPlatformReconcilerRollingRestartHandler - [operator] Stopping surge Pod
11:20:19.728 [-1149346687-pool-2-thread-14] INFO  c.h.p.o.t.OperatorK3sContainer - [EVENT] Normal [RollingRestart] HiveMQ Platform is stopping the surge Pod (rolling restart) [operator-multi-label-selector-migration:test-platform]
11:20:19.728 [ReconcilerExecutor-hivemq-controller-524] DEBUG c.h.p.operator.event.EventSender - [operator] Updating K8s event rolling-restart-scale-down-in-progress: [RollingRestart] HiveMQ Platform is stopping the surge Pod (rolling restart)
11:20:19.728 [ReconcilerExecutor-hivemq-controller-524] INFO  c.h.p.o.h.AbstractHiveMQReconcilerStateHandler - [operator] Update HiveMQ Platform Status (ROLLING_RESTART [RESTART_PODS_IN_PROGRESS] -> ROLLING_RESTART [SCALE_DOWN_IN_PROGRESS]): HiveMQ Platform is stopping the surge Pod (rolling restart)
11:20:19.748 [ReconcilerExecutor-hivemq-controller-524] DEBUG i.j.o.p.event.EventProcessor - Event processing finished. Scope: ExecutionScope{ resource id: ResourceID{name='test-platform', namespace='operator-multi-label-selector-migration'}, version: 1141}, PostExecutionControl: PostExecutionControl{onlyFinalizerHandled=false, updatedCustomResource=null, runtimeException=io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://localhost:32775/apis/hivemq.com/v1/namespaces/operator-multi-label-selector-migration/hivemq-platforms/test-platform/status?fieldManager=hivemq-controller&force=true. Message: Operation cannot be fulfilled on hivemq-platforms.hivemq.com "test-platform": the object has been modified; please apply your changes to the latest version and try again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=hivemq.com, kind=hivemq-platforms, name=test-platform, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on hivemq-platforms.hivemq.com "test-platform": the object has been modified; please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}).}
11:20:19.748 [ReconcilerExecutor-hivemq-controller-524] DEBUG i.j.o.p.event.EventProcessor - Full client conflict error during event processing ExecutionScope{ resource id: ResourceID{name='test-platform', namespace='operator-multi-label-selector-migration'}, version: 1141}
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://localhost:32775/apis/hivemq.com/v1/namespaces/operator-multi-label-selector-migration/hivemq-platforms/test-platform/status?fieldManager=hivemq-controller&force=true. Message: Operation cannot be fulfilled on hivemq-platforms.hivemq.com "test-platform": the object has been modified; please apply your changes to the latest version and try again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=hivemq.com, kind=hivemq-platforms, name=test-platform, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on hivemq-platforms.hivemq.com "test-platform": the object has been modified; please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}).
	at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:205)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:507)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:524)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:419)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handlePatch(OperationSupport.java:397)
	at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handlePatch(BaseOperation.java:764)
	at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.lambda$patch$2(HasMetadataOperation.java:231)
	at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:236)
	at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:251)
	at io.fabric8.kubernetes.client.dsl.internal.HasMetadataOperation.patch(HasMetadataOperation.java:44)
	at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher$CustomResourceFacade.patchStatus(ReconciliationDispatcher.java:413)
	at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.reconcileExecution(ReconciliationDispatcher.java:168)
	at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleReconcile(ReconciliationDispatcher.java:125)
	at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleDispatch(ReconciliationDispatcher.java:94)
	at io.javaoperatorsdk.operator.processing.event.ReconciliationDispatcher.handleExecution(ReconciliationDispatcher.java:67)
	at io.javaoperatorsdk.operator.processing.event.EventProcessor$ReconcilerExecutor.run(EventProcessor.java:444)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: PATCH at: https://localhost:32775/apis/hivemq.com/v1/namespaces/operator-multi-label-selector-migration/hivemq-platforms/test-platform/status?fieldManager=hivemq-controller&force=true. Message: Operation cannot be fulfilled on hivemq-platforms.hivemq.com "test-platform": the object has been modified; please apply your changes to the latest version and try again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=hivemq.com, kind=hivemq-platforms, name=test-platform, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on hivemq-platforms.hivemq.com "test-platform": the object has been modified; please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}).
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:642)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.requestFailure(OperationSupport.java:622)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.assertResponseCode(OperationSupport.java:582)
	at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.lambda$handleResponse$0(OperationSupport.java:549)
	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
	at io.fabric8.kubernetes.client.http.StandardHttpClient.lambda$completeOrCancel$10(StandardHttpClient.java:141)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
	at io.fabric8.kubernetes.client.http.ByteArrayBodyHandler.onBodyDone(ByteArrayBodyHandler.java:51)
	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
	at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2179)
	at io.fabric8.kubernetes.client.vertx.VertxHttpRequest.lambda$consumeBytes$1(VertxHttpRequest.java:120)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:270)
	at io.vertx.core.impl.ContextInternal.dispatch(ContextInternal.java:252)
	at io.vertx.core.http.impl.HttpEventHandler.handleEnd(HttpEventHandler.java:76)
	at io.vertx.core.http.impl.HttpClientResponseImpl.handleEnd(HttpClientResponseImpl.java:250)
	at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.lambda$new$0(Http1xClientConnection.java:421)
	at io.vertx.core.streams.impl.InboundBuffer.handleEvent(InboundBuffer.java:279)
	at io.vertx.core.streams.impl.InboundBuffer.write(InboundBuffer.java:157)
	at io.vertx.core.http.impl.Http1xClientConnection$StreamImpl.handleEnd(Http1xClientConnection.java:731)
	at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:327)
	at io.vertx.core.impl.ContextImpl.execute(ContextImpl.java:307)
	at io.vertx.core.http.impl.Http1xClientConnection.handleResponseEnd(Http1xClientConnection.java:962)
	at io.vertx.core.http.impl.Http1xClientConnection.handleHttpMessage(Http1xClientConnection.java:832)
	at io.vertx.core.http.impl.Http1xClientConnection.handleMessage(Http1xClientConnection.java:796)
	at io.vertx.core.net.impl.ConnectionBase.read(ConnectionBase.java:159)
	at io.vertx.core.net.impl.VertxHandler.channelRead(VertxHandler.java:153)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.CombinedChannelDuplexHandler$DelegatingChannelHandlerContext.fireChannelRead(CombinedChannelDuplexHandler.java:436)
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:346)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:318)
	at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:251)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1515)
	at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1378)
	at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1427)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:530)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:469)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1357)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:868)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:796)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:732)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:658)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:998)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 common frames omitted
11:20:19.758 [ReconcilerExecutor-hivemq-controller-524] WARN  i.j.o.p.event.EventProcessor - Resource Kubernetes Resource Creator/Update Conflict during reconciliation. Message: Failure executing: PATCH at: https://localhost:32775/apis/hivemq.com/v1/namespaces/operator-multi-label-selector-migration/hivemq-platforms/test-platform/status?fieldManager=hivemq-controller&force=true. Message: Operation cannot be fulfilled on hivemq-platforms.hivemq.com "test-platform": the object has been modified; please apply your changes to the latest version and try again. Received status: Status(apiVersion=v1, code=409, details=StatusDetails(causes=[], group=hivemq.com, kind=hivemq-platforms, name=test-platform, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Operation cannot be fulfilled on hivemq-platforms.hivemq.com "test-platform": the object has been modified; please apply your changes to the latest version and try again, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Conflict, status=Failure, additionalProperties={}). Resource name: null
11:20:19.759 [ReconcilerExecutor-hivemq-controller-524] DEBUG i.j.o.p.event.EventProcessor - Scheduling timer event for retry with delay:2000 for resource: ResourceID{name='test-platform', namespace='operator-multi-label-selector-migration'}

This is thrown where there is a rolling restart of the pods. And our reconciler is set to use SSA explicitly as:

                .withUseSSAToPatchPrimaryResource(true)
                .withSSABasedCreateUpdateMatchForDependentResources(true));

It's also weird that the reconciliation loops are taking longer than with 5.0.2 version.

metacosm · 2025-02-27T11:54:26Z

If there are exceptions and retry in place then it makes sense that the reconciliations take longer as JOSDK will probably need multiple attempts to perform the same operation that was previously working the first time.

csviri · 2025-02-27T11:57:18Z

@afalhambra-hivemq what is changed, that with SSA, since you I guess not passing a fresh resource the resourceVersion is now set on the resource, therefore now it performs optimistic locking (before was not), so what you can do is set metadata.resourceVersion to null before UpdateControl.patchStatus() is called. That will resolve this issue.

Will add this into the blog post, but see also: https://javaoperatorsdk.io/blog/2025/02/25/from-legacy-approach-to-server-side-apply/

csviri · 2025-02-27T12:10:59Z

see also: #2710

afalhambra-hivemq · 2025-02-27T12:17:25Z

To give some context here, this is happening in a rolling restart in the main reconciliation loop. We have some managed dependent resources, like in this case a StatefulSet:

@KubernetesDependent(informer = @Informer(labelSelector = LABEL_SELECTOR))
public class StatefulSetResource extends CRUDKubernetesDependentResource<StatefulSet, HiveMQPlatform> {

And that DR, the reconciliation loop is skipped, no action needed as it matches with the existing one:

11:20:19.652 [pool-38-thread-33] DEBUG c.h.p.o.d.StatefulSetResource - [operator] Desired StatefulSet (3 replicas) for status ROLLING_RESTART (RESTART_PODS_IN_PROGRESS)
11:20:19.652 [pool-38-thread-33] DEBUG i.j.o.p.e.s.i.InformerEventSource - Using PrimaryToSecondaryMapper to find secondary resources for primary: CustomResource{kind='HiveMQPlatform', apiVersion='hivemq.com/v1', metadata=ObjectMeta(annotations={meta.helm.sh/release-name=test-platform, meta.helm.sh/release-namespace=operator-multi-label-selector-migration}, creationTimestamp=2025-02-27T11:16:50Z, deletionGracePeriodSeconds=null, deletionTimestamp=null, finalizers=[hivemq-platforms.hivemq.com/finalizer], generateName=null, generation=2, labels={app.kubernetes.io/instance=test-platform, app.kubernetes.io/managed-by=Helm, app.kubernetes.io/name=hivemq-platform, app.kubernetes.io/version=4.x.y, helm.sh/chart=hivemq-platform-0.x.y}, managedFields=[ManagedFieldsEntry(apiVersion=hivemq.com/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:metadata={f:annotations={f:meta.helm.sh/release-name={}, f:meta.helm.sh/release-namespace={}}, f:finalizers={v:"hivemq-platforms.hivemq.com/finalizer"={}}, f:labels={f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:helm.sh/chart={}}}, f:status={f:crdVersion={}, f:message={}, f:reconciliationRequests={}, f:recoveryInformation={f:message={}, f:state={}, f:statePhase={}}, f:restartExtensions={}, f:state={}, f:statePhase={}}}), manager=hivemq-controller, operation=Apply, subresource=status, time=2025-02-27T11:18:45Z, additionalProperties={}), ManagedFieldsEntry(apiVersion=hivemq.com/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:status={.={}, f:crdVersion={}, f:recoveryInformation={}, f:restartExtensions={}}}), manager=fabric8-kubernetes-client, operation=Update, subresource=status, time=2025-02-27T11:16:50Z, additionalProperties={}), ManagedFieldsEntry(apiVersion=hivemq.com/v1, fieldsType=FieldsV1, fieldsV1=FieldsV1(additionalProperties={f:metadata={f:annotations={.={}, f:meta.helm.sh/release-name={}, f:meta.helm.sh/release-namespace={}}, f:finalizers={.={}, v:"hivemq-platforms.hivemq.com/finalizer"={}}, f:labels={.={}, f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:helm.sh/chart={}}}, f:spec={.={}, f:configMapName={}, f:enabled={}, f:extensions={}, f:healthApiPort={}, f:logLevel={}, f:metricsPath={}, f:metricsPort={}, f:operatorRestApiPort={}, f:secretName={}, f:services={}, f:statefulSet={.={}, f:metadata={}, f:spec={.={}, f:replicas={}, f:template={.={}, f:spec={.={}, f:containers={}}}}}, f:statefulSetMigration={}}}), manager=fabric8-kubernetes-client, operation=Update, subresource=null, time=2025-02-27T11:18:08Z, additionalProperties={})], name=test-platform, namespace=operator-multi-label-selector-migration, ownerReferences=[], resourceVersion=1141, selfLink=null, uid=e6853a0d-65bf-402c-87fd-fd7de6c20153, additionalProperties={}), spec=com.hivemq.platform.operator.v1.HiveMQPlatformSpec@5a0d1321, status=com.hivemq.platform.operator.v1.HiveMQPlatformStatus@707e94f1, deprecated=false, deprecationWarning=null}. Found secondary ids: [ResourceID{name='hivemq-configuration-test-platform', namespace='operator-multi-label-selector-migration'}] 
11:20:19.652 [pool-38-thread-33] DEBUG i.j.o.p.e.s.i.ManagedInformerEventSource - Resource not found in temporary cache reading it from informer cache, for Resource ID: ResourceID{name='hivemq-configuration-test-platform', namespace='operator-multi-label-selector-migration'}
11:20:19.652 [pool-38-thread-33] DEBUG i.j.o.p.e.s.i.ManagedInformerEventSource - Resource found in cache: true for id: ResourceID{name='hivemq-configuration-test-platform', namespace='operator-multi-label-selector-migration'}
11:20:19.656 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: metadata actual map value: managedFieldValue: {f:annotations={f:javaoperatorsdk.io/previous={}, f:meta.helm.sh/release-name={}, f:meta.helm.sh/release-namespace={}}, f:labels={f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:helm.sh/chart={}, f:hivemq-platform={}}, f:ownerReferences={k:{"uid":"e6853a0d-65bf-402c-87fd-fd7de6c20153"}={}}}
11:20:19.656 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: annotations actual map value: managedFieldValue: {f:javaoperatorsdk.io/previous={}, f:meta.helm.sh/release-name={}, f:meta.helm.sh/release-namespace={}}
11:20:19.656 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: labels actual map value: managedFieldValue: {f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:helm.sh/chart={}, f:hivemq-platform={}}
11:20:19.656 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: spec actual map value: managedFieldValue: {f:replicas={}, f:selector={}, f:serviceName={}, f:template={f:metadata={f:annotations={f:kubernetes-resource-versions={}, f:meta.helm.sh/release-name={}, f:meta.helm.sh/release-namespace={}}, f:labels={f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:helm.sh/chart={}, f:hivemq-platform={}}}, f:spec={f:containers={k:{"name":"hivemq"}={.={}, f:command={}, f:env={k:{"name":"HIVEMQ_BIND_ADDRESS"}={.={}, f:name={}, f:valueFrom={f:fieldRef={}}}, k:{"name":"JAVA_OPTS"}={.={}, f:name={}, f:value={}}}, f:image={}, f:imagePullPolicy={}, f:livenessProbe={f:failureThreshold={}, f:httpGet={f:path={}, f:port={}, f:scheme={}}, f:initialDelaySeconds={}, f:periodSeconds={}}, f:name={}, f:ports={k:{"containerPort":1883,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}, k:{"containerPort":8080,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}, k:{"containerPort":9399,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}}, f:readinessProbe={f:failureThreshold={}, f:httpGet={f:path={}, f:port={}, f:scheme={}}, f:initialDelaySeconds={}, f:periodSeconds={}}, f:resources={}, f:volumeMounts={k:{"mountPath":"/etc/podinfo/"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/hivemq/conf-k8s/"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/hivemq/operator/"}={.={}, f:mountPath={}, f:name={}}}}}, f:initContainers={k:{"name":"hivemq-platform-operator-init"}={.={}, f:image={}, f:imagePullPolicy={}, f:name={}, f:resources={f:limits={f:cpu={}, f:ephemeral-storage={}, f:memory={}}, f:requests={f:cpu={}, f:ephemeral-storage={}, f:memory={}}}, f:volumeMounts={k:{"mountPath":"/hivemq"}={.={}, f:mountPath={}, f:name={}}}}}, f:securityContext={}, f:serviceAccountName={}, f:terminationGracePeriodSeconds={}, f:volumes={k:{"name":"broker-configuration"}={.={}, f:configMap={f:name={}}, f:name={}}, k:{"name":"operator-init"}={.={}, f:emptyDir={}, f:name={}}, k:{"name":"pod-info"}={.={}, f:configMap={f:name={}}, f:name={}}}}}, f:updateStrategy={f:type={}}}
11:20:19.657 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: template actual map value: managedFieldValue: {f:metadata={f:annotations={f:kubernetes-resource-versions={}, f:meta.helm.sh/release-name={}, f:meta.helm.sh/release-namespace={}}, f:labels={f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:helm.sh/chart={}, f:hivemq-platform={}}}, f:spec={f:containers={k:{"name":"hivemq"}={.={}, f:command={}, f:env={k:{"name":"HIVEMQ_BIND_ADDRESS"}={.={}, f:name={}, f:valueFrom={f:fieldRef={}}}, k:{"name":"JAVA_OPTS"}={.={}, f:name={}, f:value={}}}, f:image={}, f:imagePullPolicy={}, f:livenessProbe={f:failureThreshold={}, f:httpGet={f:path={}, f:port={}, f:scheme={}}, f:initialDelaySeconds={}, f:periodSeconds={}}, f:name={}, f:ports={k:{"containerPort":1883,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}, k:{"containerPort":8080,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}, k:{"containerPort":9399,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}}, f:readinessProbe={f:failureThreshold={}, f:httpGet={f:path={}, f:port={}, f:scheme={}}, f:initialDelaySeconds={}, f:periodSeconds={}}, f:resources={}, f:volumeMounts={k:{"mountPath":"/etc/podinfo/"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/hivemq/conf-k8s/"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/hivemq/operator/"}={.={}, f:mountPath={}, f:name={}}}}}, f:initContainers={k:{"name":"hivemq-platform-operator-init"}={.={}, f:image={}, f:imagePullPolicy={}, f:name={}, f:resources={f:limits={f:cpu={}, f:ephemeral-storage={}, f:memory={}}, f:requests={f:cpu={}, f:ephemeral-storage={}, f:memory={}}}, f:volumeMounts={k:{"mountPath":"/hivemq"}={.={}, f:mountPath={}, f:name={}}}}}, f:securityContext={}, f:serviceAccountName={}, f:terminationGracePeriodSeconds={}, f:volumes={k:{"name":"broker-configuration"}={.={}, f:configMap={f:name={}}, f:name={}}, k:{"name":"operator-init"}={.={}, f:emptyDir={}, f:name={}}, k:{"name":"pod-info"}={.={}, f:configMap={f:name={}}, f:name={}}}}}
11:20:19.657 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: metadata actual map value: managedFieldValue: {f:annotations={f:kubernetes-resource-versions={}, f:meta.helm.sh/release-name={}, f:meta.helm.sh/release-namespace={}}, f:labels={f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:helm.sh/chart={}, f:hivemq-platform={}}}
11:20:19.657 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: annotations actual map value: managedFieldValue: {f:kubernetes-resource-versions={}, f:meta.helm.sh/release-name={}, f:meta.helm.sh/release-namespace={}}
11:20:19.657 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: labels actual map value: managedFieldValue: {f:app.kubernetes.io/instance={}, f:app.kubernetes.io/managed-by={}, f:app.kubernetes.io/name={}, f:app.kubernetes.io/version={}, f:helm.sh/chart={}, f:hivemq-platform={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: spec actual map value: managedFieldValue: {f:containers={k:{"name":"hivemq"}={.={}, f:command={}, f:env={k:{"name":"HIVEMQ_BIND_ADDRESS"}={.={}, f:name={}, f:valueFrom={f:fieldRef={}}}, k:{"name":"JAVA_OPTS"}={.={}, f:name={}, f:value={}}}, f:image={}, f:imagePullPolicy={}, f:livenessProbe={f:failureThreshold={}, f:httpGet={f:path={}, f:port={}, f:scheme={}}, f:initialDelaySeconds={}, f:periodSeconds={}}, f:name={}, f:ports={k:{"containerPort":1883,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}, k:{"containerPort":8080,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}, k:{"containerPort":9399,"protocol":"TCP"}={.={}, f:containerPort={}, f:name={}}}, f:readinessProbe={f:failureThreshold={}, f:httpGet={f:path={}, f:port={}, f:scheme={}}, f:initialDelaySeconds={}, f:periodSeconds={}}, f:resources={}, f:volumeMounts={k:{"mountPath":"/etc/podinfo/"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/hivemq/conf-k8s/"}={.={}, f:mountPath={}, f:name={}}, k:{"mountPath":"/opt/hivemq/operator/"}={.={}, f:mountPath={}, f:name={}}}}}, f:initContainers={k:{"name":"hivemq-platform-operator-init"}={.={}, f:image={}, f:imagePullPolicy={}, f:name={}, f:resources={f:limits={f:cpu={}, f:ephemeral-storage={}, f:memory={}}, f:requests={f:cpu={}, f:ephemeral-storage={}, f:memory={}}}, f:volumeMounts={k:{"mountPath":"/hivemq"}={.={}, f:mountPath={}, f:name={}}}}}, f:securityContext={}, f:serviceAccountName={}, f:terminationGracePeriodSeconds={}, f:volumes={k:{"name":"broker-configuration"}={.={}, f:configMap={f:name={}}, f:name={}}, k:{"name":"operator-init"}={.={}, f:emptyDir={}, f:name={}}, k:{"name":"pod-info"}={.={}, f:configMap={f:name={}}, f:name={}}}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: valueFrom actual map value: managedFieldValue: {f:fieldRef={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: livenessProbe actual map value: managedFieldValue: {f:failureThreshold={}, f:httpGet={f:path={}, f:port={}, f:scheme={}}, f:initialDelaySeconds={}, f:periodSeconds={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: httpGet actual map value: managedFieldValue: {f:path={}, f:port={}, f:scheme={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: readinessProbe actual map value: managedFieldValue: {f:failureThreshold={}, f:httpGet={f:path={}, f:port={}, f:scheme={}}, f:initialDelaySeconds={}, f:periodSeconds={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: httpGet actual map value: managedFieldValue: {f:path={}, f:port={}, f:scheme={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: resources actual map value: managedFieldValue: {f:limits={f:cpu={}, f:ephemeral-storage={}, f:memory={}}, f:requests={f:cpu={}, f:ephemeral-storage={}, f:memory={}}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: limits actual map value: managedFieldValue: {f:cpu={}, f:ephemeral-storage={}, f:memory={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: requests actual map value: managedFieldValue: {f:cpu={}, f:ephemeral-storage={}, f:memory={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: configMap actual map value: managedFieldValue: {f:name={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: configMap actual map value: managedFieldValue: {f:name={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.k.SSABasedGenericKubernetesResourceMatcher - key: updateStrategy actual map value: managedFieldValue: {f:type={}}
11:20:19.658 [pool-38-thread-33] DEBUG i.j.o.p.d.AbstractDependentResource - Update skipped for dependent ResourceID{name='test-platform', namespace='operator-multi-label-selector-migration'} as it matched the existing one

Then the main reconciliation is invoked, and when patching the status of the custom resource there, this exception is thrown - but no further changes are done to the custom resource, it should be a fresh and up-to-date instance, or am I wrong?

csviri · 2025-02-27T12:21:41Z

this exception is thrown - but no further changes are done to the custom resource, it should be a fresh and up-to-date instance, or am I wrong?

Well unless something modifies it in the background it should, but this is clearly about the conflict, and for status patch usually there is no reason to do optimistic locking, so would just adjust it to not have there resource version.

afalhambra-hivemq · 2025-02-27T14:41:30Z

Ok, I confirm that setting the resourceVersion to null fixes this issue. Will try to see what is causing the resource not being up-to-date.

But still curious as this issue should've been come up as well in the 5.0.2 release. Not sure how this PR for the 5.0.3 release can affect that now.

csviri · 2025-02-27T14:44:40Z

But still curious as this issue should've been come up as well in the 5.0.2 release. Not sure how this PR for the 5.0.3 release can affect that now.

That before that PR the resourceVersion was set to null explicitly for SSA to, not anymore after that PR.

csviri · 2025-02-27T14:45:14Z

but mainly glad it helped, will close this issue if no objections

csviri closed this as completed Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow reconciliations with JOSDK 5.0.3 release and failed reconciliation count in JOSDK metrics #2709

Slow reconciliations with JOSDK 5.0.3 release and failed reconciliation count in JOSDK metrics #2709

afalhambra-hivemq commented Feb 27, 2025

metacosm commented Feb 27, 2025

afalhambra-hivemq commented Feb 27, 2025

csviri commented Feb 27, 2025

csviri commented Feb 27, 2025

csviri commented Feb 27, 2025

metacosm commented Feb 27, 2025

afalhambra-hivemq commented Feb 27, 2025

metacosm commented Feb 27, 2025

csviri commented Feb 27, 2025

csviri commented Feb 27, 2025

afalhambra-hivemq commented Feb 27, 2025

csviri commented Feb 27, 2025

afalhambra-hivemq commented Feb 27, 2025

csviri commented Feb 27, 2025

csviri commented Feb 27, 2025

Slow reconciliations with JOSDK 5.0.3 release and failed reconciliation count in JOSDK metrics #2709

Slow reconciliations with JOSDK 5.0.3 release and failed reconciliation count in JOSDK metrics #2709

Comments

afalhambra-hivemq commented Feb 27, 2025

Bug Report

What did you do?

What did you expect to see?

What did you see instead? Under which circumstances?

Environment

Possible Solution

Additional context

metacosm commented Feb 27, 2025

afalhambra-hivemq commented Feb 27, 2025

csviri commented Feb 27, 2025

csviri commented Feb 27, 2025

csviri commented Feb 27, 2025

metacosm commented Feb 27, 2025

afalhambra-hivemq commented Feb 27, 2025

metacosm commented Feb 27, 2025

csviri commented Feb 27, 2025

csviri commented Feb 27, 2025

afalhambra-hivemq commented Feb 27, 2025

csviri commented Feb 27, 2025

afalhambra-hivemq commented Feb 27, 2025

csviri commented Feb 27, 2025

csviri commented Feb 27, 2025