-
Notifications
You must be signed in to change notification settings - Fork 7.8k
Fix GH-11240: PHP crashes on big file with array inside #11272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The crash happens because there's a stack overflow in the recursive SCC algorithm. Fix this by transforming it into an iterative implementation of the same algorithm. We manually keep the recursion stack now. I tested the correctness by running the CI test suite using both the old implementation and the new implementation and letting the test fail if the SCC values differ. The tests passed without failure. For the test case of OP, I benchmarked the performance: With this patch Time (mean ± σ): 645.9 ms ± 7.4 ms [User: 603.7 ms, System: 40.7 ms] Range (min … max): 634.0 ms … 659.3 ms 10 runs Without this patch Time (mean ± σ): 755.3 ms ± 18.1 ms [User: 698.8 ms, System: 55.2 ms] Range (min … max): 737.6 ms … 784.4 ms 10 runs We can see an improvement in performance as well because the function call overhead and control transfer overhead is eliminated now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to spend a bit more time converting this algorithm or finding the reference iterative implementation. See https://arxiv.org/pdf/1703.10023.pdf for example.
/* Need to come back from "recursion" */ \ | ||
recursion_stack[recursion_stack_idx].var = var; \ | ||
recursion_stack[recursion_stack_idx].current_use = use; \ | ||
recursion_stack[recursion_stack_idx].current_phi = p; \ | ||
STORE_CURRENT_SYM; \ | ||
recursion_stack_idx++; \ | ||
/* "Recurse" on next var */ \ | ||
var = var2; \ | ||
use = ssa->vars[var].use_chain; \ | ||
p = ssa->vars[var].phi_use_chain; \ | ||
sym_p = ssa->vars[var].sym_use_chain; \ | ||
dfs[var] = *index; \ | ||
(*index)++; \ | ||
root[var] = var; \ | ||
goto recurse; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maintaining of stack with 3 additional lists looks a bit weird to me.
Did you use some reference algorithm implementation or invent this yourself?
Also see an implementation of improved algorithm https://github.com/DavePearce/StronglyConnectedComponents |
I checked out https://arxiv.org/pdf/1703.10023.pdf. It does provide some optimisation ideas, one of which already happens (combine visited[] and dfs[] into a single array dfs). And it also gives some more ideas. It does not however provide a reference algorithm for iterative SCC.
I did it myself. Let me explain the additional items on the stack. In Dave Pearce's algorithm we'd still have to maintain the additional items on the stack. |
Let me know when I should review this once again. |
Closed in favour of #12528 |
The crash happens because there's a stack overflow in the recursive SCC algorithm. Fix this by transforming it into an iterative implementation of the same algorithm. We manually keep the recursion stack now.
I tested the correctness by running the CI test suite using both the old implementation and the new implementation and letting the test fail if the SCC values differ. The tests passed without failure.
For the test case of OP, I benchmarked the performance:
With this patch
Without this patch
We can see an improvement in performance as well because the function call overhead and control transfer overhead is eliminated now.