- Introduction
- Failure Modes and Recovery Paths
- Generic failure modes:
- Specific Action Items
- Generic Action Items
- Audit Requirements
Author | George Knee |
Created at | 2025-03-20 |
Initial Reviewers | Mark Tyneway |
Need Approval From | Tom Assas, Michael Amadi (Shadowing) |
Status | Implementing Actions 🛫 |
The "Withdrawals Root in Block Header" feature copies some information stored in the L2 blockchain state into the block header, making it a part of the history (information stored by all, including non-archive, nodes). The information in question is the L2toL1MessagePasser
account storage root, and it is stored in the previously unused withdrawalsRoot
field of the block header.
It allows proposals to be made and verified without the needing to bear the cost of running an archive node.
Below are references for this project:
-
Description: If the
withdrawalsRoot
in the block header is incorrect, critical infra used to enable withdrawals may fail. Namely, output proposals and challenges would be incorrect, affecting chains with permissioned and chains with permissionless proofs. This is because these components will, with the activation of the Isthmus fork, use thewithdrawalsRoot
header field instead of querying the information from an archive node in the usual way. Output roots are returned by the op-nodeoptimism_outputAtBlock
RPC method, and this behaves differently under Isthmus -- when handling a request for an output root (it no longer delegates aneth_getProof
to op-geth and just reads the information from the block header).Triggers:
-
A failed hardfork activation in the execution client.
-
If there is an execution client bug, for example it is possible the root is (incorrectly) added to the header before the state is fully committed.
-
If we were to ever introduce non empty
withdrawals
in the block body, this might override the mechanism introduced with this feature and invalidate the interpreation of thewithdrawalsRoot
field.
-
-
Risk Assessment:
High impact, low likelihood
Temporary downtime for withdrawals, or loss of funds if not remediated in time to challenge a malicious proposal.
Mitigations:
-
We rely on e2e tests to check for consistency between the outputs returned from op-node and those constructed manually in the old way.
-
Instead of waiting for the failure mode to materialize and then writing a patch in a rush, we could add an optional config var to op-node to switch it back into the old behavior. Rolling out the fix would not then require any software releases.
-
op-geth could be modified to log a critical error (triggering an alert) if the withdrawals list in the body is ever non empty.
-
-
Detection: Fault proof monitoring systems may not detect this failure mode immediately, until an actor running patched software made a proposal or challenge.
-
Recovery Path(s): Fault proof infra would nee to be pointed at a patched op-node. The patch would restore the old behaviour for generating output roots.
-
Description: Because this feature introduces a new p2p gossip topic and message serialization format, a bug can mean the failure of p2p gossip for any chain with Isthmus active. This would cause an unsafe chain halt on affected nodes (but the safe chain would still progress).
-
Risk Assessment:
High impact, low likelihood
Mitigations: We rely on end-to-end testing (including fuzzing) to catch any bugs in this code path. We could run extended fuzzing campaigns.
-
Detection: Continuous integration, or Kurtosis and/or devnet testing would catch this. Failing that, the bug makes it to production, our alerting infrastructure would notify us.
-
Recovery Path(s): The bug would need to be patched and new op-node release cut and rolled out.
See the generic FMA:
- Chain halt at activation (there is a change to the engine API, which elevates this risk)
- Activation failure
- Invalid setImplementation execution
- Chain split (across clients)
- (BLOCKING) e2e tests must check for consistency between output roots returned from op-node and those constructed manually in the old way https://github.com/ethereum-optimism/optimism/blob/6a436fe9ac9acb215b0f4b9f87ccd3832f4d6b72/op-e2e/actions/upgrades/isthmus_fork_test.go#L286-L301
- (non-BLOCKING) op-node could be furnished with an override to make it serve output roots in the legacy fashion; this would also aid in testing (see above item). It would even allow us to run the two systems side by side for a time before fully switching over. ethereum-optimism/optimism#15150
- (non-BLOCKING) op-geth could be made to log a critical error triggering an alert if ever the
withdrawals
list in the block body is non empty (post Isthmus)
- (BLOCKING): We have implemented extensive unit and end-to-end testing of the activation flow: https://github.com/ethereum-optimism/optimism/blob/develop/op-e2e/actions/upgrades/isthmus_fork_test.go
- (BLOCKING): We have implemented multi-client testing with kurtosis and/or devnets to reduce the chance of bugs. This should be in the form of an acceptance tests which target all client types in the network ethereum-optimism/optimism#15102
- (BLOCKING) We should ensure that our usual suite of alerts applies to devnets and are routed to protocol engineers signing off on the devnet completion.
- (BLOCKING): Run fuzzing on the v4 gossip p2p more than 10s (assignee: @Ethnical @geoknee) ethereum-optimism/optimism#15068
- (BLOCKING): We tested the activation on our devnets.
- (non-BLOCKING): Creating a monitoring that differential testing from the merkle tree inclusion computation and the block.header request
withdrawalRoot
(assignee: @Ethnical). Tracking -> Monitoring Security-Issue - (non-BLOCKING): We have implemented fuzz testing in a kurtosis multi-client devnet to reduce the chance of bugs
An audit has not been deemed necessary.