You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ReplicationOperation should fail gracefully (#115341)
Problem:
finishAsFailed could be called asynchronously in the
middle of operations like runPostReplicationActions which try to
sync the translog. finishAsFailed immediately triggers the failure
of the resultListener which releases the index shard primary
operation permit. This means that runPostReplicationActions may
try to sync the translog without an operation permit.
Solution:
We refactor the infrastructure of ReplicationOperation regarding
pendingActions and the resultListener, by replacing them with a
RefCountingListener. This way, if there are async failures, they
are aggregated, and the result listener is called once, after all
mid-way operations are done.
For the specific error we got in issue #97183, this means that
a call to onNoLongerPrimary (which can happen if we fail to fail
a replica shard or mark it as stale) will not immediately release
the primary operation permit and the assertion in the translog sync
will be honored.
Fixes#97183
0 commit comments