Skip to content

Race Between Initial Block Hashes can cause panic #74

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
TheBlueMatt opened this issue Oct 12, 2022 · 0 comments · Fixed by #78
Closed

Race Between Initial Block Hashes can cause panic #74

TheBlueMatt opened this issue Oct 12, 2022 · 0 comments · Fixed by #78

Comments

@TheBlueMatt
Copy link
Contributor

MaxFangX, on discord, reported:

Spent the last few hours pinning down the root cause of a channel manager panic. Turns out it was a race condition inherited from ldk-sample:

  • (1) If a channel manager is starting for the first time, it gets its initial block hash by calling BlockSource::get_blockchain_info() and constructing a BestBlock out of that.
  • (2) During the initial chain sync, if chain_tip.is_none() then the best chain_tip is retrieved using lightning_block_sync::init::validate_best_block_header() which similarly uses BlockSource::get_best_block()
    If blocks were mined between (1) and (2), the channel manager’s block hash will be at a lower height than that of the block hash contained inside the chain_tip fed to the SpvClient. Suppose that this results in the channel manager being at block 1 and the SpvClient being at block 2. Once block 3 comes in and the SpvClient detects it via poll_best_tip(), the SpvClient computes the ChainDifference between its saved chain_tip value and block 3, concluding that it just needs to connect block 3. However, when Listen::filtered_block_connected is called using the data in block 3, the channel manager panics with “Blocks must be connected in chain-order - the connected header must build on the last connected header” - because block 3’s prev block hash is block 2’s block hash, rather than block 1’s block hash which the channel manager expects.

It seems very unlikely to happen in practice but that also means it has the potential to be a superrrrr flaky bug for someone else in the future. Came up in our integration tests because we have separate "init" and "sync" stages, where "init" initializes the channel manager and "sync" calls into lightning_block_sync::init, and we were mining blocks in between.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant