Skip to content

Generated baseline can't be used if regex can't be compilated to UTF-8 chars #3835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 14 commits into
base: 1.12.x
Choose a base branch
from
3 changes: 3 additions & 0 deletions .github/workflows/e2e-tests.yml
Original file line number Diff line number Diff line change
@@ -322,6 +322,9 @@ jobs:
- script: |
cd e2e/bug-11819
../../bin/phpstan
- script: |
cd e2e/bug-12629
../../bin/phpstan
steps:
- name: "Checkout"
31 changes: 31 additions & 0 deletions e2e/bug-12629/phpstan-baseline.neon
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
parameters:
ignoreErrors:
-
message: '#^Method Bug12629\\Bug12629\:\:is_macintosh_enc\(\) has no return type specified\.$#'
identifier: missingType.return
count: 1
path: src/bug-12629.php

-
message: '#^Method Bug12629\\Bug12629\:\:is_macintosh_enc\(\) has parameter \$s with no type specified\.$#'
identifier: missingType.parameter
count: 1
path: src/bug-12629.php

-
message: '#^Method Bug12629\\Bug12629\:\:is_macintosh_enc\(\) is unused\.$#'
identifier: method.unused
count: 1
path: src/bug-12629.php

-
message: '#^Regex pattern is invalid\: Compilation failed\: UTF\-8 error\: byte 2 top bits not 0x80 at offset 0 in pattern$#'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

before this fix this error message had the invalid pattern appended, which made the parser crash while loading the baseline later on

identifier: regexp.pattern
count: 1
path: src/bug-12629.php

-
message: '#^Regex pattern is invalid\: Compilation failed\: UTF\-8 error\: isolated byte with 0x80 bit set at offset 1 in pattern$#'
identifier: regexp.pattern
count: 1
path: src/bug-12629.php
7 changes: 7 additions & 0 deletions e2e/bug-12629/phpstan.neon
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
includes:
- phpstan-baseline.neon

parameters:
level: 8
paths:
- src
17 changes: 17 additions & 0 deletions e2e/bug-12629/src/bug-12629.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
<?php

namespace Bug12629;

class Bug12629 {
private function is_macintosh_enc($s) {

if(!is_string($s)) {
return false;
}

preg_match_all("![\x80-\x9f]!u", $s, $matchesMacintosh);
preg_match_all("!\xc3[\x80-\x9f]!u", $s, $matchesUtf8);

return count($matchesMacintosh[0]) > 0 && 0 == count($matchesUtf8[0]);
}
}
12 changes: 12 additions & 0 deletions src/Rules/Regexp/RegularExpressionPatternRule.php
Original file line number Diff line number Diff line change
@@ -9,11 +9,15 @@
use PHPStan\Analyser\Scope;
use PHPStan\Rules\Rule;
use PHPStan\Rules\RuleErrorBuilder;
use PHPStan\ShouldNotHappenException;
use PHPStan\Type\Regex\RegexExpressionHelper;
use function in_array;
use function sprintf;
use function str_contains;
use function str_starts_with;
use function strrpos;
use function strtolower;
use function substr;

/**
* @implements Rule<Node\Expr\FuncCall>
@@ -123,6 +127,14 @@ private function validatePattern(string $pattern): ?string
try {
Strings::match('', $pattern);
} catch (RegexpException $e) {
if (str_contains($e->getMessage(), 'UTF-8 error')) {
$lastColonPos = strrpos($e->getMessage(), ':');
if ($lastColonPos === false) {
throw new ShouldNotHappenException();
}
// strip invalid utf-8 pattern contents to keep the error message NEON parsable.
return substr($e->getMessage(), 0, $lastColonPos);
}
return $e->getMessage();
}