[Tracker] Interop: Production Infrastructure Architecture #15175

axelKingsley · 2025-04-02T16:37:19Z

This collection of tasks needs to be handled before we can have realistic expectations to run a public/production network.

Why is this Important

Optimism's Native Interop is based on the ability to avoid including Invalid Interop Messages. In Practical Terms, a Sequencer with a functioning Supervisor will be able to determine if Interop Messages are Cross Unsafe with a high degree of accuracy. However, it is not 100% accurate, because cross unsafe data is inherently unsafe -- It is an optimistic extension of the chain, but could be proven false if the remote chain reorgs.

But In Practical Terms, what is the likelihood of the remote chain experiencing a reorg? Unsafe reorgs on OP Stack Chains are extremely uncommon currently, with only one such event happening on OP Mainnet since Bedrock.

With Interop based block-invalidation, there is a new opportunity for reorgs on chains - block builders who built their unsafe chain from an invalid message can either fruitlessly extend the invalid chain and pay to publish the batch on the L1 so it can be properly replaced. OR, they may choose instead to drop the stalling unsafe chain aggressively.

There is a vicious cycle here -- the way for Supervisor Checks to fail is for there to be reorgs. And the way for there to be reorgs is for Supervisor Checks to fail. This suggests that under normal operation of a superchain, no invalid messages are ever added to blocks. However, when an invalid message is added to a block, it opens the door for more invalid messages to get into the pipeline.

There are going to be emergent behaviors in superchain stability that we should be prepared to respond to with things like Admin APIs and Leadership Transfers. If a recurrence of invalid messages continually affects the block builders ability to advance the chain, we need the tools to stop interop processing to allow networks to resume normal operation. After normal operation resumes, reintroducing Interop Messages functionality would be safe again.

We need to start getting experience in these kinds of outcomes by using Kurtosis based Devnets plus large scale Network Automation like the NAT framework being developed. In the meantime, we have a number of practical component behaviors that we want to implement which we believe will allow for better network response by block builders and the transaction pipeline.

Documents and their Action Items

Topology and Tx Flow for Interop chain Operation

proxyd to call checkAccessList for Transactions with Interop Access Lists
Make Mempool Igress checkAccessList Filter configurable (off for sequencers)
Remove block-builder checkInterop call
Introduce recurring checkInterop batch call in Mempool

Transaction Handling FMA

proxyd will need a rate-limit feature to avoid overloading the Supervisor
- This is because proxyd doesn't have an up to date balance view of the sender
- the rate limit can be given heuristics to still allow trustable traffic through

OP Supervisor FMA

Alternative Validation Implementation (Epic]
Higher Quality Testing (Managed in other Work Streams)
Batcher to limit publishing range to Cross-Unsafe
Sequencer to watch for Cross-Unsafe stalls to reorg
Admin API to disable Interop inclusion during incidents
Conductor to use Supervisor Liveness as potential Leadership Transfer trigger
Implement Standard Mode
Standard Mode should accept multiple Supervisors for comparison

The text was updated successfully, but these errors were encountered:

axelKingsley · 2025-04-07T18:05:36Z

Met with @zhwrd @yashvardhan-kukreja @jelias2 to discuss items they can take on.

Proxyd:

1 send raw tx which has an Interop access list : 1 call to checkAccessLIst
(existing srtx rate limit is 90 per second)
- Limit is already pretty low, are we sure an additional RL will make a difference?
- At least looking to have two different liveness controls (so interop doesn't interrupt typical traffic)
Should set up a dedicated Supervisor (again, do we need rate limit?)
3 tx ingress rpcs, so 3 supervisors (and implied nodes)
AI:
Put together design with proposed design flow (@jelias2 or @yashvardhan-kukreja ) - how specifically will proxyd route? how will it rate limit?

Conductor:

Might need separate ZDD process for Supervisor
Leadership transfer based on Supervisor going down
Want to be able to selectively turn Supervisor Conductor Triggers off
Tracking things like the cross-unsafe head for leadership transfer is maybe too tricky

axelKingsley added the M-needs-triage Meta: this issue needs to be labelled label Apr 2, 2025

axelKingsley changed the title ~~Interop: Production Infrastructure Architecture [Tracker]~~ [Tracker] Interop: Production Infrastructure Architecture Apr 2, 2025

github-project-automation bot added this to Optimism Protocol Roadmap - H1 2025 Apr 2, 2025

protolambda added H-interop Hardfork: change planned for interop upgrade and removed M-needs-triage Meta: this issue needs to be labelled labels Apr 10, 2025

axelKingsley mentioned this issue Apr 11, 2025

FMA: Interop Transaction Handling ethereum-optimism/design-docs#249

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracker] Interop: Production Infrastructure Architecture #15175

[Tracker] Interop: Production Infrastructure Architecture #15175

axelKingsley commented Apr 2, 2025 •

edited

Loading

axelKingsley commented Apr 7, 2025

[Tracker] Interop: Production Infrastructure Architecture #15175

[Tracker] Interop: Production Infrastructure Architecture #15175

Comments

axelKingsley commented Apr 2, 2025 • edited Loading

Why is this Important

Documents and their Action Items

axelKingsley commented Apr 7, 2025

axelKingsley commented Apr 2, 2025 •

edited

Loading