-
Notifications
You must be signed in to change notification settings - Fork 1
Issues with syntax #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The fact that the parentheses on constructors are optional is the culprit here: I don't remember how C# solved this; need to find that paper again. |
I read about it a while ago, C# handles it case by case, since Adding something innocuous to the syntax might be a compromise. What do you think of this? new A<use B, C>(D);
foo(new A<use B, C>(D)); |
I believe other languages don't have this problem because in java, c#, etc there is no ambiguity in using '<>' in expressions like this. And hack, which already implements generics, infers the type from the arguments so they don't have this issue to begin with. Another approach would be to simply allow for the ambiguity, introducing a BC break for all valid expressions which are also valid new- or instanceof-statements using the '<>' type alias list. This can not be done with the LR(1) parser currently used, but bison supports GLR parsers. It's a minor change in code really. The shift/reduce conflict generated by the '<>' construct will cause the parser to branch off, following both paths until one path fails, or allowing to specify which branch has precedence when both paths succeed. For example, below statements would still work (no BC break): class Test {
public function __toString() {
return 'A';
}
}
var_dump(new Test<stdClass);
var_dump(new Test<stdClass, stdClass> 'A');
var_dump(new Test<stdClass && stdClass > 'A');
echo new Test<stdClass;
echo new Test<stdClass, stdClass> 'A';
echo new Test<stdClass && stdClass > 'A'; bool(true)
bool(true)
bool(true)
bool(true)
1111 And these statements would be BC breakages: class Test<X, Y> {
public function __toString() {
return 'A';
}
}
define('stdClass', 'B');
var_dump(new Test<stdClass, stdClass>(1));
echo new Test<stdClass, stdClass>(1); object(Test)#1 (0) {
}
A I believe the chance of BC breakage in the wild is very small, because the existing code would have to use an argument- or echo-list with a new/instanceof statement without parenthesis in one argument and be comparing it with the LT operator against a global constant, and a second argument comparing a global constant with GT against an expression enclosed in parenthesis. What are the odds? Porting old code to support the new rules would be easy: var_dump(new Test()<stdClass, stdClass>(1)); Or: var_dump((new Test<stdClass), stdClass>(1)); Does this make any sense? |
This comment has been minimized.
This comment has been minimized.
PHP does not have a comma operator. |
I think he meant comma as separator for function args in construction like this: call (new Date < FOO, BAR > ('now') ) |
Ha! 😆 Well, that explains why nobody's using it. |
We may need to make this bit of syntax whitespace-sensitive then?
Obviously that kind of goes against the nature of the language, but considering...
That is I mean, there might be existing code that looks like this:
But who would put extra parens around a string and why? The worst-case realistic version of that probably is this:
If we prioritize a valid generic expression, there's no ambiguity here, right? I mean, there's a pretty serious visual ambiguity - but once we add generics, the parse tree where the start of this expression is interpreted as generics doesn't result in a valid parse tree for the whole expression, so it should be possible for this particular case, right? If there's a concern about this breaking something in real-world code, I'd have to see a better, more realistic example - the extra parens around a string is so contrived, I can't imagine that will really affect anyone. (and if it were to affect one or two people around the world, it is really easy to fix.) |
We discussed our options here with @bwoebi yesterday. Some notes. There are, I think, basically three issues with
Assuming we want to stick with Lexer lookahead has the same issues as already discussed with arrow functions. Doing lookahead for the basic The other alternative is to switch to a GLR parser. I have created a prototype for this. The GLR parser works by pursuing both possible parse trees and then discarding the one that generates an error. For the true ambiguity in the As we don't have any potential for nesting, GLR should not exhibit any worst case (exponential) performance here (though might make parsing overall slower). However, the main disadvantage of GLR is that all PHP tooling will have to also switch to GLR or equivalent. This is going to be a pain for PHP-Parser at least, as we don't have a GLR parser implemented in PHP currently. |
Next to However, the
Doing this is possible, but comes at the cost of avoiding any zero-length productions between the class name and the start of the class statements. This effectively means that we need to spell out all the possible combinations of generic params, extends and interfaces (making for 6 rules per root class kind). A prototype for this is available as well. @bwoebi's official position on using |
What about a look ahead for now and look and switching the parser in the future? |
It is not that trivial, e.g. you'll have to ignore comments in between, any whitespace, etc. As to userland matters (PHP-Parser ...), they may opt to do the lexer lookahead. |
I personally prefer |
While my purist part agrees with you, I think it's a bit better to move forward with <> despite GLR, I don't want contention about the grammar in RFC phase. |
Do you remember the fuss when PHP introduced the backslash as namespace separator instead of Just stick with |
Could we alleviate that issue by exposing an API that provides access to the AST? I know this doesn't address all use-cases (such as wanting to parse older PHP code, but PHP-Parser could continue to do that) but maybe it alleviates the pressure to maintain an entire PHP parser in PHP itself? |
Is it really necessary to use GLR? What is we simply define that $list_of_a = new List<A>;
$list_of_a = new List<A>();
$list_of_a = (new List<A>);
$a_lt_n = (new A) < $n; // mandatory parenthesis around new, otherwise syntax error
$a_lt_n = new A() < $n; // parenthesis separate the operator
$b = call((new A) < $n, $m > 0);
$b = call(new A() < $n, $m > 0); |
@jkufner that is actually quite simple and elegant. 🙂 |
GLR is needed mainly to solve the finite-lookahead issue. The actual |
Looks like someone is working on exposing the native AST now! https://twitter.com/lisachenko/status/1217887229470822400?s=19 |
Would there be no issues exposing the generic/template type name as the usual type using the |
I think the As for On a related note, PHP being dynamic, would something like |
@mindplay-dk Thanks, that makes sense. |
@morrisonlevi You mentioned a few issues with ambiguity regarding the syntax within statements.
The only one I am currently aware of is:
Did you have any ideas on how to solve it?
The text was updated successfully, but these errors were encountered: