Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FMA: Interop Transaction Handling #249

Open
wants to merge 3 commits into
base: fma-supervisor
Choose a base branch
from

Conversation

axelKingsley
Copy link
Contributor

No description provided.

@axelKingsley axelKingsley force-pushed the fma-interopTx-filters branch from ce0c767 to 6d90225 Compare April 1, 2025 20:31
@tynes
Copy link
Contributor

tynes commented Apr 1, 2025

The framing of two extremes is a great introduction to this topic

@tynes
Copy link
Contributor

tynes commented Apr 1, 2025

Is there any consideration with the batcher? Did we add some new interop specific config?

@tynes
Copy link
Contributor

tynes commented Apr 1, 2025

There could be a failure mode of disk growth given we have to index a lot of data. The solution seems to be pruning, which we will know the depth at which we can prune once we get the final expiry window

@SozinM
Copy link

SozinM commented Apr 2, 2025

Another more esoteric option for checks:
When building block N we validate all cross transaction for to block N+1 with the timeout 2 seconds.
Then, while building block N+1 we would have very fresh transactions that we could include.
This assumes that it's possible to validate batch under 2 seconds, so they would be on time for the next block building round.
This will load supervisor as it would make a lot of checks every 2 seconds.
To improve this a bit we could validate only number of transaction that would most likely be included into the block (if we have 100k txs in mempool it's obvious that we won't include them all into the next block)

- We should deploy at least one "Utility Supervisor" to respond to Checks from `proxyd` instances.
The size and quantitiy of the Supervisor(s) could be scaled if needed. (Note: A Supervisor also requires Nodes
of each network to function)
## FM3b: Transaction Volume causes DOS Failures of Supervisor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to call out here, that's different in ingress than tx-pool usage (*) is that transactions can declare a very very long access-list that they cannot actually afford to include. Since the gas-limit vs gas-used and sender balance are not checked yet.

Two possible solutions:

  1. rate-limit the access-list checks. E.g. configure a burst (= per tx access-list size limit) and a rate (= total load on supervisor) that may be checked.
  2. do balance/nonce check before access-list check, in ingress.

Option 1 may be sufficient, if the rate is reasonable, even if txs do not pay. This largely depends on supervisor performance. On the bright side, these checks should be relatively quick (binary search a flat DB) and can run in parallel (different proxyd RPC handler routines, different supervisor RPC handler routines).

Option 2 here is going to add load to EL nodes, and thus less favorable from a DoS perspective. And it's easy to get out of sync (since you would need to look at the pending nonce, not the latest state, and either may lag behind on a verifier node).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding Option 2 getting out of sync: the EL Nodes could do all this checking through an API, right? We could make checkInterop an API of Geth, which does the balance checks and then calls the Supervisor if needed. Then we can manage just one checking implementation, in geth.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, I will add this callout and Action Item to this document, thanks.

@axelKingsley
Copy link
Contributor Author

Another more esoteric option for checks: When building block N we validate all cross transaction for to block N+1 with the timeout 2 seconds. Then, while building block N+1 we would have very fresh transactions that we could include. This assumes that it's possible to validate batch under 2 seconds, so they would be on time for the next block building round. This will load supervisor as it would make a lot of checks every 2 seconds. To improve this a bit we could validate only number of transaction that would most likely be included into the block (if we have 100k txs in mempool it's obvious that we won't include them all into the next block)

@SozinM thanks for this idea! I think this is functionally equivalent to our decision to batch-evaluate these messages on a recurring timer. If the timer happened to match the block building time, you'd achieve the same effect, where transactions get evaluated in anticipation of the next block. Hooking it directly to block number sounds neat, but I wouldn't want to put validation in the way of our block building timing.

And I totally agree regarding only validating on block's worth in advance. The nice thing is that we can statically determine if transactions are interop or not, so if there are 100k tx in the pool, we will only need to look over them once to identify the interop ones, and batch-check those (which would be a small subset of all possible tx)

@axelKingsley axelKingsley marked this pull request as ready for review April 4, 2025 19:57
@axelKingsley
Copy link
Contributor Author

Takeaways from review meeting:

  • We are comfortable disabling interop as-needed, when processing interop leads to negative externalities for the rest of the chain
    • Expiring messages are the largest concern
    • Kelvin points out you can simply have the app layer re-emit messages
  • One big risk we'd like to track and understand is a chaotic cycle of sequencers initiating reorgs
    • The "Smarter" a sequencer might be to avoid invalid blocks, the more reorgs that get made
    • If there were too many reorgs, chains may stomp on each other and cause widespread issue
    • To mitigate this, we want lots of testing on the emergent behaviors, using dev networks with test sequencers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants