Skip to content

Segmentation fault when generating composer autoload with 50k+ classes #11795

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oprypkhantc opened this issue Jul 25, 2023 · 5 comments · Fixed by #12528
Closed

Segmentation fault when generating composer autoload with 50k+ classes #11795

oprypkhantc opened this issue Jul 25, 2023 · 5 comments · Fixed by #12528

Comments

@oprypkhantc
Copy link

Description

Hey.

Running composer dumpautoload --optimize-autoloader twice on a large codebase (50k+ classes it seems) with OPCache enabled in Docker results in a segmentation fault. Changing OPCache settings doesn't seem to do anything.

This is reproducible on a x86_64, but is not reproducible on ARM MacBook. PHP 8.1.9 is also affected.

I've prepared a small repository with reproduction: https://github.com/oprypkhantc/php-composer-segfault/tree/main

docker run -w /app -v "$PWD:/app" -it $(docker build -q .) i --ignore-platform-reqs
docker run -w /app -v "$PWD:/app" -it $(docker build -q .) dump -o
docker run -w /app -v "$PWD:/app" -it $(docker build -q .) dump -o

The first dump -o results in an exit code 0 and this output:

Generating optimized autoload files
Warning: ...
Generated optimized autoload files containing 61131 classes

The second dump -o results in an exit code 139 and this output (notice missing line compared to the first run):

Generating optimized autoload files
Warning: ...

PHP Version

8.2.8

Operating System

Ubuntu 20.04

@iluuu1994
Copy link
Member

I guess @nielsdos beat me to it 🙂 But I can reproduce this:


In function ::{main} (before dfa):
cycle in uses of 2 (CV $vendorDir)

$_main:
     ; (lines=122268, args=0, vars=2, tmps=61136, ssa_vars=122267, no_loops)
     ; (at SSA integrity verification)
     ; /home/ilutov/Developer/php-composer-segfault/vendor/composer/autoload_classmap.php:1-61141
     ; return  [array [string] of [string]]
     ; #0.CV0($vendorDir) [undef, ref, any]
     ; #1.CV1($baseDir) [undef, ref, any]
BB0:
     ; start exit lines=[0-122267]
     ; level=0
0000 ASSIGN #0.CV0($vendorDir) [undef, ref, any] -> #2.CV0($vendorDir) [ref, any] string("/home/ilutov/Developer/php-composer-segfault/vendor")
0001 INIT_FCALL 1 96 string("dirname")
0002 SEND_VAR #2.CV0($vendorDir) [ref, any] 1
0003 #3.V4 [string] = DO_ICALL
0004 ASSIGN #1.CV1($baseDir) [undef, ref, any] -> #4.CV1($baseDir) [ref, any] #3.V4 [string]
0005 #5.T6 [string] = CONCAT #2.CV0($vendorDir) [ref, any] string("/aws/aws-crt-php/src/AWS/CRT/Auth/AwsCredentials.php")
0006 #6.T7 [array [string] of [string]] = INIT_ARRAY 61131 #5.T6 [string] string("AWS\\CRT\\Auth\\AwsCredentials")
0007 #7.T8 [string] = CONCAT #2.CV0($vendorDir) [ref, any] string("/aws/aws-crt-php/src/AWS/CRT/Auth/CredentialsProvider.php")
0008 ADD_ARRAY_ELEMENT #7.T8 [string] string("AWS\\CRT\\Auth\\CredentialsProvider") #6.T7 [array [string] of [string]] -> #8.T7 [array [string] of [string]]
0009 #9.T9 [string] = CONCAT #2.CV0($vendorDir) [ref, any] string("/aws/aws-crt-php/src/AWS/CRT/Auth/Signable.php")
0010 ADD_ARRAY_ELEMENT #9.T9 [string] string("AWS\\CRT\\Auth\\Signable") #8.T7 [array [string] of [string]] -> #10.T7 [array [string] of [string]]
0011 #11.T10 [string] = CONCAT #2.CV0($vendorDir) [ref, any] string("/aws/aws-crt-php/src/AWS/CRT/Auth/SignatureType.php")
...
122262 ADD_ARRAY_ELEMENT #122261.T61135 [string] string("voku\\helper\\SimpleXmlDomNodeBlank") #122260.T7 [array [string] of [string]] -> #122262.T7 [array [string] of [string]]
122263 #122263.T61136 [string] = CONCAT #2.CV0($vendorDir) [ref, any] string("/voku/simple_html_dom/src/voku/helper/SimpleXmlDomNodeInterface.php")
122264 ADD_ARRAY_ELEMENT #122263.T61136 [string] string("voku\\helper\\SimpleXmlDomNodeInterface") #122262.T7 [array [string] of [string]] -> #122264.T7 [array [string] of [string]]
122265 #122265.T61137 [string] = CONCAT #2.CV0($vendorDir) [ref, any] string("/voku/simple_html_dom/src/voku/helper/XmlDomParser.php")
122266 ADD_ARRAY_ELEMENT #122265.T61137 [string] string("voku\\helper\\XmlDomParser") #122264.T7 [array [string] of [string]] -> #122266.T7 [array [string] of [string]]
122267 RETURN #122266.T7 [array [string] of [string]]
php-dev: /home/ilutov/Developer/php-src/Zend/Optimizer/ssa_integrity.c:401: ssa_verify_integrity: Assertion `0 && "SSA integrity verification failed"' failed.

This can also be reproduced with php -d opcache.enable_cli=1 vendor/composer/autoload_classmap.php, no need to wait for the autoloader (can be very slow with a debug build).

@nielsdos
Copy link
Member

Race ya ;) /jk

I got the same cycle error, but that might be a red herring though because it assumes that > 10000 uses means a cycle:

if (++c > 10000) {
  FAIL("cycle in uses of " VARFMT "\n", VAR(i));
  goto finish;
}

there are more than that nr of uses here.
I have a feeling it's actually the strongly connected components algorithm doing a stackoverflow.

@nielsdos
Copy link
Member

I raised the 10000 and the cycle error is gone.
I can reproduce it in the Docker container reliably.
I can only reproduce it on a debug build if I lower the stack limit using ulimit -s, and I believe I was right:

==463345==ERROR: AddressSanitizer: stack-overflow on address 0x7ffcbe402ffc (pc 0x55f4a810ce86 bp 0x7ffcbe403030 sp 0x7ffcbe402fe0 T0)
    #0 0x55f4a810ce86 in zend_ssa_check_scc_var /run/media/niels/MoreData/php-src/Zend/Optimizer/zend_inference.c:176
    #1 0x55f4a810df39 in zend_ssa_check_scc_var /run/media/niels/MoreData/php-src/Zend/Optimizer/zend_inference.c:185
    #2 0x55f4a810df39 in zend_ssa_check_scc_var /run/media/niels/MoreData/php-src/Zend/Optimizer/zend_inference.c:185
...

repeat that a couple of hundred times...

It's because the SCC algorithm is recursive instead of iterative.
This is actually a duplicate of GH-11240, and could be fixed with #11272. But Dmitry had some remarks that I should use a reference implementation instead of creating an iterative version myself.

I can resume work on that PR probably, or someone else can too if they want (I'm kinda deep down in ext/dom troubles). There's links in the comment section to relevant papers & implementations. I believe the implementation I wrote can be adapted fairly easily to those reference algorithms since they have the same core idea: going up and down the DFS with a flag basically.

@dstogov
Copy link
Member

dstogov commented Jul 31, 2023

@nielsdos your approach to SCC algorithm conversion might be right, but I wouldn't harry to fix it.

The application of the original algorithm might be not efficient in the first place.
Not that at the moment of SCC identification we already identified control loops and reducibility (or irreducibility).
So, we may evaluate only a part of the data flow graph that nests in the loops.

@danog
Copy link
Contributor

danog commented Dec 13, 2023

Might be not completely fixed, see #12953

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants