Skip to content

Fix rollback logic #130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 16, 2020
Merged

Fix rollback logic #130

merged 2 commits into from
Jun 16, 2020

Conversation

erikd
Copy link
Contributor

@erikd erikd commented Jun 11, 2020

Orphan blocks occur when a new block does not extend the chain with the
most recent block in the database, but extends it with a block before
the most recent one.

@erikd erikd force-pushed the erikd/orphan-blocks branch from 273f95d to 5c81bd9 Compare June 12, 2020 04:07
@erikd erikd changed the title Detect and rollback orphaned blocks Fox rollback logic Jun 12, 2020
@erikd erikd force-pushed the erikd/orphan-blocks branch from 5c81bd9 to 5cb3be4 Compare June 12, 2020 04:11
@erikd erikd changed the title Fox rollback logic Fix rollback logic Jun 12, 2020
@erikd erikd requested a review from dcoutts June 12, 2020 05:57
@erikd
Copy link
Contributor Author

erikd commented Jun 12, 2020

If @dcoutts is happy with this it can be merged.

Comment on lines 39 to 53
slotNo <- lift DB.queryLatestSlotNo
if slotNo <= unSlotNo slot
then
liftIO . logInfo trce $
mconcat
[ "Byron: No rollback required: db tip slot is ", textShow slotNo
, ", ledger tip slot is ", textShow (unSlotNo slot)
]
else do
liftIO . logInfo trce $
mconcat
[ "Byron: Rollbacking to slot ", textShow (unSlotNo slot)
, ", hash ", Byron.renderAbstractHash hash
]
void . lift $ DB.deleteCascadeSlotNo slotNo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this isn't right yet and can still be a lot simpler.

The slot and hash we are given is the point to roll back to. So we want to delete blocks after that point, keeping that point at the most recent point.

This means we never need to check slotNo <= unSlotNo slot because it's an unnecessary special case. If we're told to roll back to our current tip point then by definition delete blocks after that point is a no-op.

But looking at DB.deleteCascadeSlotNo slotNo, it does not delete blocks after the given slot. It deletes blocks in that slot, and only that slot. That cannot be right because it will only delete the current tip block, but we have to be able to cope with rollbacks of arbitrary depth.

So this should be unconditional: just delete all blocks that have a strictly higher slot number than the slot of the rollback point. As an assertion / paranoia check, we could check that the (slot,hash) pair do indeed exist in the DB, so that the rollback makes sense.

There's no special case needed for the "No rollback required" case. That comes out naturally.

@erikd
Copy link
Contributor Author

erikd commented Jun 13, 2020

Without the query and the if statement this fails on startup:

Shelley: Rollbacking to slot 699307, hash 357d742304db0037a0c11391b7768aded4b27f37902998c9fb35dc2e981f6e52
DB lookup fail in insertABlock: block hash 357d742304db0037a0c11391b7768aded4b27f37902998c9fb35dc2e981f6e52

Which means the correct chain tip was dropped and then the roll forward fails because the tip of the chain has just been dropped.

With the query and the if I see;

Shelley: No rollback required: db tip slot is 699284, ledger tip slot is 699284

and it works correctly.

@erikd erikd force-pushed the erikd/orphan-blocks branch from 4c67f7a to 957f5ee Compare June 16, 2020 23:32
@erikd
Copy link
Contributor Author

erikd commented Jun 16, 2020

New version of fix.
Logging output:

insertShelleyBlock: slot 1031396, block 42038, hash 8cfb194798a39c3600e821495e97d763b3f2f121b6ad5d39df68ebafbe277d0b
Shelley: Rolling back to slot 1031387, hash 9582d85383aa56d8e71acd32b25cde9d5e9ea453574c4e273f90fd372fc5d4e7
Shelley: Deleting blocks numbered: [42038]
insertShelleyBlock: slot 1031397, block 42038, hash 41a16cdcb29511cd53f9815f551918b0c3c2c6af8a2cddbf3d41e311a3065bcb
Shelley: Rolling back to slot 1031387, hash 9582d85383aa56d8e71acd32b25cde9d5e9ea453574c4e273f90fd372fc5d4e7
Shelley: Deleting blocks numbered: [42038]
insertShelleyBlock: slot 1031396, block 42038, hash 8cfb194798a39c3600e821495e97d763b3f2f121b6ad5d39df68ebafbe277d0b
insertShelleyBlock: slot 1031402, block 42039, hash c7ed80db757e14ad1c1706e2ba3be14f0bda69f26072336fe6c1f463e18cb29b
Shelley: Rolling back to slot 1031387, hash 9582d85383aa56d8e71acd32b25cde9d5e9ea453574c4e273f90fd372fc5d4e7
Shelley: Deleting blocks numbered: [42039, 42038]
insertShelleyBlock: slot 1031397, block 42038, hash 41a16cdcb29511cd53f9815f551918b0c3c2c6af8a2cddbf3d41e311a3065bcb
insertShelleyBlock: slot 1031426, block 42039, hash e3ceb7230fe41bb38be21608050c08d57e2cf01ca4a95f7885bea4f213c34f5c
Shelley: Rolling back to slot 1031387, hash 9582d85383aa56d8e71acd32b25cde9d5e9ea453574c4e273f90fd372fc5d4e7
Shelley: Deleting blocks numbered: [42039, 42038]
insertShelleyBlock: slot 1031396, block 42038, hash 8cfb194798a39c3600e821495e97d763b3f2f121b6ad5d39df68ebafbe277d0b
insertShelleyBlock: slot 1031402, block 42039, hash c7ed80db757e14ad1c1706e2ba3be14f0bda69f26072336fe6c1f463e18cb29b

erikd added 2 commits June 17, 2020 09:40
Rollback messages from chainsync were not being handled correctly which
meant that orphaned blocks were not removed. When orphaned blocks were
not removed from the block table that table could end up with two or
more rows with the same block number.

Closes: #114
BlockId is not guaranteed to be monotonic.
Copy link
Contributor

@dcoutts dcoutts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@erikd erikd force-pushed the erikd/orphan-blocks branch from 957f5ee to 65b6b70 Compare June 16, 2020 23:40
@erikd erikd merged commit 1d94717 into master Jun 16, 2020
@iohk-bors iohk-bors bot deleted the erikd/orphan-blocks branch June 16, 2020 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants