-
Notifications
You must be signed in to change notification settings - Fork 79
Add NodeIterator #139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add NodeIterator #139
Conversation
docs/Traversal.md
Outdated
```php | ||
// Find all nodes in the current scope | ||
$nodesInScopeReIt = new \RecursiveCallbackFilterIterator($node, function ($current, $key, Iterator $iterator) { | ||
// Don't traverse into function nodes, they form a differnt scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm overall in this early state :)
I like it but I'm curious what the perf impact is compared to a generator or a simple loop |
Yes, would be worth to benchmark. My gut is telling me that it is faster than generators, cause they have been slow in PHP historically. The implementation maybe could also be optimised a bit more at the cost of uglier code (array index instead of ArrayIterator could be faster). |
Fixed it, but the test is not complete yet. Any suggestions how the implementation could be optimized appreciated. Would also like to add an Iterator that walks upwards (following |
tests/api/NodeIteratorTest.php
Outdated
} | ||
foo(); | ||
PHP; | ||
const FILE_CONTENTS = ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use nowdoc
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the advantage?
I wrote a very basic perf test that parses and traverses the full syntax trees of every file in the Wordpress project. In that scenario, the Iterator solution is much slower.
There seems to be a lot going on for each iteration. I hacked NodeIterator to replace the ChildNamesIterator with an index into the array, and I see it around 1.72s, so there's probably more optimization that can be done. I don't know whether I'd want to use it in the language server quite yet or call it the recommended way to traverse the tree. What do you think? It's too bad because perf aside, I would prefer this style. |
Oh, the times are pretty poor.... |
src/Node.php
Outdated
* @return NodeIterator | ||
*/ | ||
public function getIterator() { | ||
return new NodeIterator($this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Iterator\NodeIterator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops, yes. didn't change it when I moved it to the namespace.
Those times include parsing so the time diff spent iterating is more significant. |
Too bad! Can you share the benchmark code? I will try to squeeze more performance out of it (that was my suspicion anyway). I prefer this style too because the returned Iterator maintains the semantics of being nested, not flat, but if it's that slower I wouldn't use it in the language server. |
Sure, here: https://github.com/Microsoft/tolerant-php-parser/tree/iteratorPerfTest Run with |
I tried everything I could to optimize, but the Generator API is twice as fast. But it should also be mentioned that "reimplementing" the traversal inline instead of using the the Generator API is 4x as fast, so the more ergonomic the API, the worse the performance.
|
I think we should put it in perspective and benchmark real application code. I.e. benchmark the PHP language server indexing a project with one API vs the other. It may be twice as fast, but if the bottleneck is elsewhere and one API or the other is only indexing time of e.g. 10s vs 11s than I would still prefer the more advanced API if it makes the LS code more readable and DRY. |
It also depends on the use case. Some users might just want to write a little script for a single large automatic code refactor and just need to write it quickly and then execute it once. We could document that there are essentially three ways to traverse ASTs that have trade-offs in performance vs API ergonomics. Users can then pick what they want. |
I agree that we should try it out in a real-world situation. It may be that while visiting every node is slower, it's easier to traverse more efficiently or something like that. I'll revisit this once I'm through the current outstanding issues. |
Closes #130