Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tracker] Interop: Production Infrastructure Architecture #15175

Open
13 tasks
axelKingsley opened this issue Apr 2, 2025 · 1 comment
Open
13 tasks

[Tracker] Interop: Production Infrastructure Architecture #15175

axelKingsley opened this issue Apr 2, 2025 · 1 comment
Labels
H-interop Hardfork: change planned for interop upgrade

Comments

@axelKingsley
Copy link
Contributor

axelKingsley commented Apr 2, 2025

This collection of tasks needs to be handled before we can have realistic expectations to run a public/production network.

Why is this Important

Optimism's Native Interop is based on the ability to avoid including Invalid Interop Messages. In Practical Terms, a Sequencer with a functioning Supervisor will be able to determine if Interop Messages are Cross Unsafe with a high degree of accuracy. However, it is not 100% accurate, because cross unsafe data is inherently unsafe -- It is an optimistic extension of the chain, but could be proven false if the remote chain reorgs.

But In Practical Terms, what is the likelihood of the remote chain experiencing a reorg? Unsafe reorgs on OP Stack Chains are extremely uncommon currently, with only one such event happening on OP Mainnet since Bedrock.

With Interop based block-invalidation, there is a new opportunity for reorgs on chains - block builders who built their unsafe chain from an invalid message can either fruitlessly extend the invalid chain and pay to publish the batch on the L1 so it can be properly replaced. OR, they may choose instead to drop the stalling unsafe chain aggressively.

There is a vicious cycle here -- the way for Supervisor Checks to fail is for there to be reorgs. And the way for there to be reorgs is for Supervisor Checks to fail. This suggests that under normal operation of a superchain, no invalid messages are ever added to blocks. However, when an invalid message is added to a block, it opens the door for more invalid messages to get into the pipeline.

There are going to be emergent behaviors in superchain stability that we should be prepared to respond to with things like Admin APIs and Leadership Transfers. If a recurrence of invalid messages continually affects the block builders ability to advance the chain, we need the tools to stop interop processing to allow networks to resume normal operation. After normal operation resumes, reintroducing Interop Messages functionality would be safe again.

We need to start getting experience in these kinds of outcomes by using Kurtosis based Devnets plus large scale Network Automation like the NAT framework being developed. In the meantime, we have a number of practical component behaviors that we want to implement which we believe will allow for better network response by block builders and the transaction pipeline.

Documents and their Action Items

Topology and Tx Flow for Interop chain Operation

  • proxyd to call checkAccessList for Transactions with Interop Access Lists
  • Make Mempool Igress checkAccessList Filter configurable (off for sequencers)
  • Remove block-builder checkInterop call
  • Introduce recurring checkInterop batch call in Mempool

Transaction Handling FMA

  • proxyd will need a rate-limit feature to avoid overloading the Supervisor
    • This is because proxyd doesn't have an up to date balance view of the sender
    • the rate limit can be given heuristics to still allow trustable traffic through

OP Supervisor FMA

  • Alternative Validation Implementation (Epic]
  • Higher Quality Testing (Managed in other Work Streams)
  • Batcher to limit publishing range to Cross-Unsafe
  • Sequencer to watch for Cross-Unsafe stalls to reorg
  • Admin API to disable Interop inclusion during incidents
  • Conductor to use Supervisor Liveness as potential Leadership Transfer trigger
  • Implement Standard Mode
  • Standard Mode should accept multiple Supervisors for comparison
@axelKingsley axelKingsley added the M-needs-triage Meta: this issue needs to be labelled label Apr 2, 2025
@axelKingsley axelKingsley changed the title Interop: Production Infrastructure Architecture [Tracker] [Tracker] Interop: Production Infrastructure Architecture Apr 2, 2025
@axelKingsley
Copy link
Contributor Author

Met with @zhwrd @yashvardhan-kukreja @jelias2 to discuss items they can take on.

Proxyd:

  • 1 send raw tx which has an Interop access list : 1 call to checkAccessLIst
  • (existing srtx rate limit is 90 per second)
    • Limit is already pretty low, are we sure an additional RL will make a difference?
    • At least looking to have two different liveness controls (so interop doesn't interrupt typical traffic)
  • Should set up a dedicated Supervisor (again, do we need rate limit?)
  • 3 tx ingress rpcs, so 3 supervisors (and implied nodes)
    AI:
  • Put together design with proposed design flow (@jelias2 or @yashvardhan-kukreja ) - how specifically will proxyd route? how will it rate limit?

Conductor:

  • Might need separate ZDD process for Supervisor
  • Leadership transfer based on Supervisor going down
  • Want to be able to selectively turn Supervisor Conductor Triggers off
  • Tracking things like the cross-unsafe head for leadership transfer is maybe too tricky

@protolambda protolambda added H-interop Hardfork: change planned for interop upgrade and removed M-needs-triage Meta: this issue needs to be labelled labels Apr 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
H-interop Hardfork: change planned for interop upgrade
Projects
Status: No status
Development

No branches or pull requests

2 participants