Skip to content

Bug #76844 PHP crashes on big file with array inside #11240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
MikeRich88 opened this issue May 14, 2023 · 2 comments · Fixed by #12528
Closed

Bug #76844 PHP crashes on big file with array inside #11240

MikeRich88 opened this issue May 14, 2023 · 2 comments · Fixed by #12528

Comments

@MikeRich88
Copy link

MikeRich88 commented May 14, 2023

Description

https://bugs.php.net/bug.php?id=76844

I typed up a really nice thing and then the browser crashed so I'm not doing that again 😭

<?php

$factor = 10000;

$num = 62;
$file = '';
for($i=0;$i<100000;$i++) {
	$file .= "\$wall_bans[$num]['".md5($i)."'] = true;\n";
	if($i % $factor == 0) {
		$num++;
	}
}

file_put_contents('bigarray.php', "<?php\n\$wall_bans = [];".$file."\n?>");

echo "OK, now run: php -d opcache.enable_cli=1 bigarray.php\n";

?>

Run code, then run the code it makes.

it does it on PHP 8.2.5 on CentOS 7 and PHP 8.2.6 on Mac Venture whatever (after you fix the install). This bug was never fixed.

PHP Version

8.2.5

Operating System

No response

@nielsdos
Copy link
Member

nielsdos commented May 14, 2023

Here's a PoC patch (on top of current PHP-8.1 branch) that solves the issue by modifying Tarjan's SCC implementation to an iterative approach. The transformation is quite mechanical so perhaps there's room for improvement. I ran out of time for today so I didn't have the chance to check the correctness, or how well it performs vs the old recursive implementation. An extra pair of eyes is welcome as always :)

** old patch **

```diff diff --git a/Zend/Optimizer/zend_inference.c b/Zend/Optimizer/zend_inference.c index 91a142a9f8..7e2eb433c2 100644 --- a/Zend/Optimizer/zend_inference.c +++ b/Zend/Optimizer/zend_inference.c @@ -74,18 +74,6 @@ } \ } while (0)

-#define CHECK_SCC_VAR(var2) \

  • do { \
  •   if (!ssa->vars[var2].no_val) { \
    
  •   	if (dfs[var2] < 0) { \
    
  •   		zend_ssa_check_scc_var(op_array, ssa, var2, index, dfs, root, stack); \
    
  •   	} \
    
  •   	if (ssa->vars[var2].scc < 0 && dfs[root[var]] >= dfs[root[var2]]) { \
    
  •   		root[var] = root[var2]; \
    
  •   	} \
    
  •   } \
    
  • } while (0)

#define CHECK_SCC_ENTRY(var2)
do {
if (ssa->vars[var2].scc != ssa->vars[var].scc) {
@@ -172,43 +160,78 @@ static inline bool sub_will_overflow(zend_long a, zend_long b) {
}
#endif

-static void zend_ssa_check_scc_var(const zend_op_array *op_array, zend_ssa *ssa, int var, int *index, int *dfs, int *root, zend_worklist_stack stack) / {{{ */
+static void zend_ssa_check_scc_var(const zend_op_array *op_array, zend_ssa *ssa, int var, int *index, int *dfs, int *root, zend_worklist_stack *stack)
{
-#ifdef SYM_RANGE

  • zend_ssa_phi *p;
    -#endif
  • zend_worklist_stack node_stack;
  • ALLOCA_FLAG(node_stack_use_heap);
  • /* Format: first we have the from node, then we have to to node.
    • If the from node is negative, then we haven't gone back yet. */
  • ZEND_WORKLIST_STACK_ALLOCA(&node_stack, ssa->vars_count * 2, node_stack_use_heap);

+#define CHECK_SCC_VAR(var2) \

  • do { \

  •   if (!ssa->vars[var2].no_val) { \
    
  •   	if (dfs[var2] < 0) { \
    
  •   		dfs[var2] = *index; \
    
  •   		(*index)++; \
    
  •   		root[var2] = var2; \
    
  •   		zend_worklist_stack_push(&node_stack, -var - 1); \
    
  •   		zend_worklist_stack_push(&node_stack, var2); \
    
  •   	} \
    
  •   } \
    
  • } while (0)

  • zend_worklist_stack_push(&node_stack, -var - 1);

  • zend_worklist_stack_push(&node_stack, var);

    dfs[var] = *index;
    (*index)++;
    root[var] = var;

  • FOR_EACH_VAR_USAGE(var, CHECK_SCC_VAR);
  • while (node_stack.len > 0) {

  •   var = zend_worklist_stack_peek(&node_stack);
    
  •   int *state = &node_stack.buf[node_stack.len - 2];
    
  •   /* searching further */
    
  •   if (*state < 0) {
    
  •   	*state = -*state;
    

#ifdef SYM_RANGE

  • /* Process symbolic control-flow constraints */
  • p = ssa->vars[var].sym_use_chain;
  • while (p) {
  •   CHECK_SCC_VAR(p->ssa_var);
    
  •   p = p->sym_use_chain;
    
  • }
  •   	/* Process symbolic control-flow constraints */
    
  •   	zend_ssa_phi *p = ssa->vars[var].sym_use_chain;
    
  •   	while (p) {
    
  •   		CHECK_SCC_VAR(p->ssa_var);
    
  •   		p = p->sym_use_chain;
    
  •   	}
    

#endif

  •   	FOR_EACH_VAR_USAGE(var, CHECK_SCC_VAR);
    
  •   }
    
  •   /* coming back */
    
  •   else {
    
  •   	if (root[var] == var) {
    
  •   		ssa->vars[var].scc = ssa->sccs;
    
  •   		while (stack->len > 0) {
    
  •   			int var2 = zend_worklist_stack_peek(stack);
    
  •   			if (dfs[var2] <= dfs[var]) {
    
  •   				break;
    
  •   			}
    
  •   			zend_worklist_stack_pop(stack);
    
  •   			ssa->vars[var2].scc = ssa->sccs;
    
  •   		}
    
  •   		ssa->sccs++;
    
  •   	} else {
    
  •   		zend_worklist_stack_push(stack, var);
    
  •   	}
    
  • if (root[var] == var) {
  •   ssa->vars[var].scc = ssa->sccs;
    
  •   while (stack->len > 0) {
    
  •   	int var2 = zend_worklist_stack_peek(stack);
    
  •   	if (dfs[var2] <= dfs[var]) {
    
  •   		break;
    
  •   	zend_worklist_stack_pop(&node_stack); /* pop var */
    
  •   	int from = zend_worklist_stack_pop(&node_stack) - 1;
    
  •   	if (ssa->vars[from].scc < 0 && dfs[root[var]] >= dfs[root[from]]) {
    
  •   		root[var] = root[from];
      	}
    
  •   	zend_worklist_stack_pop(stack);
    
  •   	ssa->vars[var2].scc = ssa->sccs;
      }
    
  •   ssa->sccs++;
    
  • } else {
  •   zend_worklist_stack_push(stack, var);
    
    }
  • ZEND_WORKLIST_STACK_FREE_ALLOCA(&node_stack, node_stack_use_heap);
    }
    -/* }}} */

ZEND_API int zend_ssa_find_sccs(const zend_op_array *op_array, zend_ssa ssa) / {{{ */
{


EDIT: I benchmarked (by putting the stack limit to unlimited).
**NEW**
  Time (mean ± σ):     517.3 ms ±   3.5 ms    [User: 486.4 ms, System: 28.7 ms]
  Range (min … max):   513.0 ms … 523.5 ms    10 runs
**OLD**
  Time (mean ± σ):     530.6 ms ±   4.8 ms    [User: 492.9 ms, System: 35.5 ms]
  Range (min … max):   524.9 ms … 539.9 ms    10 runs

So no performance degradation by switching to an iterative approach.

</details>

EDIT: I'm currently working on a new patch for this based on the above PoC.

nielsdos added a commit to nielsdos/php-src that referenced this issue May 18, 2023
The crash happens because there's a stack overflow in the recursive SCC
algorithm. Fix this by transforming it into an iterative implementation
of the same algorithm. We manually keep the recursion stack now.

I tested the correctness by running the CI test suite using both the old
implementation and the new implementation and letting the test fail if
the SCC values differ. The tests passed without failure.

For the test case of OP, I benchmarked the performance:
With this patch
  Time (mean ± σ):     645.9 ms ±   7.4 ms    [User: 603.7 ms, System: 40.7 ms]
  Range (min … max):   634.0 ms … 659.3 ms    10 runs
Without this patch
  Time (mean ± σ):     755.3 ms ±  18.1 ms    [User: 698.8 ms, System: 55.2 ms]
  Range (min … max):   737.6 ms … 784.4 ms    10 runs

We can see an improvement in performance as well because the function
call overhead and control transfer overhead is eliminated now.
@nielsdos nielsdos linked a pull request May 18, 2023 that will close this issue
@nielsdos nielsdos linked a pull request Oct 26, 2023 that will close this issue
@MikeRich88
Copy link
Author

Wanted to come back and say thanks to everyone. It's good to put one in the win column sometimes. (whereas my Apple bug reports seem to be printed out directly into a shredder)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants