Skip to content

Simplify initialization of max_seq_no of updates #41161

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Apr 30, 2019
Merged

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Apr 12, 2019

Today we choose to initialize max_seq_no_of_updates on primaries only so we can deal with a situation where a primary is on an old node (before 6.5) which does not have MUS while replicas on new nodes (6.5+). However, this strategy is quite complex and can lead to bugs (for example #40249) since we have to assign a correct value (not too low) to MSU in all possible situations (before recovering from translog, restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes in the cluster should have MSU. This change simplifies the initialization of MSU by always assigning it a correct value in the constructor of Engine regardless of whether it's a replica or primary.

Relates #33842

/cc @bleskes

@dnhatn dnhatn added >enhancement :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. v8.0.0 v7.2.0 labels Apr 12, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnhatn , thanks for looking into this simplification, looking good.

I do wonder if we should move the field to InternalEngine, let me know your thoughts on that.

@dnhatn
Copy link
Member Author

dnhatn commented Apr 14, 2019

@henningandersen Thanks for looking. I pushed changes.

@dnhatn dnhatn requested a review from henningandersen April 14, 2019 22:28
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dnhatn thanks for the additional work on this. Left two comments, hope you can clarify my uncertainty around advanceMaxSeqNoOfUpdatesOrDeletesOnPrimary?

@@ -131,11 +131,6 @@ protected long generateSeqNoForOperationOnPrimary(final Operation operation) {
return operation.seqNo();
}

@Override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this, mainly because I do not understand why it was introduced in #39571 . I think it was done to make behavior identical to before changing the code to not mutate engine during planning step. Not sure if it is important though to not advance MSU here, since MSU should already be higher than the delete seqNo on a follower?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henningandersen Two reasons that I removed it:

  • This should be a noop for the following engine since MSU should be at least the seqno of a delete.
  • We should delegate to advanceMaxSeqNoOfUpdatesOrDeletesOnPrimary not only deletes but also updates.

The downside is that we might catch bugs where MSU from the leader is not passed correctly with advanceMaxSeqNoOfUpdatesOrDeletesOnPrimary.

I am good with either keeping or removing it. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opt to remove it. One option to mitigate the downside is to override delete() and assert that seqNo <= MSU in FollowingEngine?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good plan. We talked about this, Nhat. After reflection, let's do it here in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 0f4492b to restore this method and add an assertion in the following engine.

@dnhatn dnhatn requested a review from henningandersen April 15, 2019 16:16
@dnhatn
Copy link
Member Author

dnhatn commented Apr 27, 2019

@henningandersen @ywelsch This is ready for another round. Can you please give it go? Thank you!

@dnhatn dnhatn requested review from ywelsch and removed request for s1monw, ywelsch and henningandersen April 27, 2019 00:19
Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Apr 29, 2019

@henningandersen Are you good with this change?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dnhatn
Copy link
Member Author

dnhatn commented Apr 30, 2019

@henningandersen @ywelsch Thank you for your reviews.

@dnhatn dnhatn merged commit 73bfdc4 into elastic:master Apr 30, 2019
@dnhatn dnhatn deleted the init-msu branch April 30, 2019 00:57
dnhatn added a commit that referenced this pull request Apr 30, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only so
we can deal with a situation where a primary is on an old node (before
6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low) to
MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes
in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in the
constructor of Engine regardless of whether it's a replica or primary.

Relates #33842
akhil10x5 pushed a commit to akhil10x5/elasticsearch that referenced this pull request May 2, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only so
we can deal with a situation where a primary is on an old node (before
6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example elastic#40249) since we have to assign a correct value (not too low) to
MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes
in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in the
constructor of Engine regardless of whether it's a replica or primary.

Relates elastic#33842
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only so
we can deal with a situation where a primary is on an old node (before
6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example elastic#40249) since we have to assign a correct value (not too low) to
MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all nodes
in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in the
constructor of Engine regardless of whether it's a replica or primary.

Relates elastic#33842
marregui added a commit to crate/crate that referenced this pull request Dec 2, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only
so we can deal with a situation where a primary is on an old node
(before 6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low)
to MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all
nodes in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in
the constructor of Engine regardless of whether it's a replica or
primary.

Relates #33842

Port of elastic/elasticsearch#41161
marregui added a commit to crate/crate that referenced this pull request Dec 2, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only
so we can deal with a situation where a primary is on an old node
(before 6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low)
to MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all
nodes in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in
the constructor of Engine regardless of whether it's a replica or
primary.

Relates #33842

Port of elastic/elasticsearch#41161
marregui added a commit to crate/crate that referenced this pull request Dec 3, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only
so we can deal with a situation where a primary is on an old node
(before 6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low)
to MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all
nodes in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in
the constructor of Engine regardless of whether it's a replica or
primary.

Relates #33842

Port of elastic/elasticsearch#41161
mergify bot pushed a commit to crate/crate that referenced this pull request Dec 3, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only
so we can deal with a situation where a primary is on an old node
(before 6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low)
to MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all
nodes in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in
the constructor of Engine regardless of whether it's a replica or
primary.

Relates #33842

Port of elastic/elasticsearch#41161
mergify bot pushed a commit to crate/crate that referenced this pull request Dec 3, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only
so we can deal with a situation where a primary is on an old node
(before 6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low)
to MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all
nodes in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in
the constructor of Engine regardless of whether it's a replica or
primary.

Relates #33842

Port of elastic/elasticsearch#41161

(cherry picked from commit 62ebfb0)
marregui added a commit to crate/crate that referenced this pull request Dec 3, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only
so we can deal with a situation where a primary is on an old node
(before 6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low)
to MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all
nodes in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in
the constructor of Engine regardless of whether it's a replica or
primary.

Relates #33842

Port of elastic/elasticsearch#41161

(cherry picked from commit 62ebfb0)
marregui added a commit to crate/crate that referenced this pull request Dec 3, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only
so we can deal with a situation where a primary is on an old node
(before 6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low)
to MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all
nodes in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in
the constructor of Engine regardless of whether it's a replica or
primary.

Relates #33842

Port of elastic/elasticsearch#41161

(cherry picked from commit 62ebfb0)
mergify bot pushed a commit to crate/crate that referenced this pull request Dec 3, 2019
Today we choose to initialize max_seq_no_of_updates on primaries only
so we can deal with a situation where a primary is on an old node
(before 6.5) which does not have MUS while replicas on new nodes (6.5+).
However, this strategy is quite complex and can lead to bugs (for
example #40249) since we have to assign a correct value (not too low)
to MSU in all possible situations (before recovering from translog,
restoring history on promotion, and handing off relocation).

Fortunately, we don't have to deal with this BWC in 7.0+ since all
nodes in the cluster should have MSU. This change simplifies the
initialization of MSU by always assigning it a correct value in
the constructor of Engine regardless of whether it's a replica or
primary.

Relates #33842

Port of elastic/elasticsearch#41161

(cherry picked from commit 62ebfb0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. >enhancement v7.2.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants