Skip to content

Add NodeIterator #139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed

Conversation

felixfbecker
Copy link

@felixfbecker felixfbecker commented Jun 9, 2017

Closes #130

```php
// Find all nodes in the current scope
$nodesInScopeReIt = new \RecursiveCallbackFilterIterator($node, function ($current, $key, Iterator $iterator) {
// Don't traverse into function nodes, they form a differnt scope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Contributor

@jens1o jens1o left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm overall in this early state :)

@roblourens
Copy link
Member

I like it but I'm curious what the perf impact is compared to a generator or a simple loop

@felixfbecker
Copy link
Author

Yes, would be worth to benchmark. My gut is telling me that it is faster than generators, cause they have been slow in PHP historically. The implementation maybe could also be optimised a bit more at the cost of uglier code (array index instead of ArrayIterator could be faster).

@felixfbecker
Copy link
Author

felixfbecker commented Jun 10, 2017

Fixed it, but the test is not complete yet. Any suggestions how the implementation could be optimized appreciated. Would also like to add an Iterator that walks upwards (following parent), but don't know a good name.

}
foo();
PHP;
const FILE_CONTENTS = '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use nowdoc instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the advantage?

@felixfbecker felixfbecker changed the title WIP: Add NodeIterator Add NodeIterator Jun 10, 2017
@roblourens
Copy link
Member

I wrote a very basic perf test that parses and traverses the full syntax trees of every file in the Wordpress project. In that scenario, the Iterator solution is much slower.

  • A simple recursive function and loop over CHILD_NAMES: ~.88s
  • Node's getDescendantNodesAndTokens generator-based helper: ~1.03s
  • RecursiveIteratorIterator: ~1.88s

There seems to be a lot going on for each iteration. I hacked NodeIterator to replace the ChildNamesIterator with an index into the array, and I see it around 1.72s, so there's probably more optimization that can be done. I don't know whether I'd want to use it in the language server quite yet or call it the recommended way to traverse the tree. What do you think? It's too bad because perf aside, I would prefer this style.

@jens1o
Copy link
Contributor

jens1o commented Jun 12, 2017

Oh, the times are pretty poor....

src/Node.php Outdated
* @return NodeIterator
*/
public function getIterator() {
return new NodeIterator($this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be Iterator\NodeIterator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

woops, yes. didn't change it when I moved it to the namespace.

@roblourens
Copy link
Member

Those times include parsing so the time diff spent iterating is more significant.

@felixfbecker
Copy link
Author

Too bad! Can you share the benchmark code? I will try to squeeze more performance out of it (that was my suspicion anyway). I prefer this style too because the returned Iterator maintains the semantics of being nested, not flat, but if it's that slower I wouldn't use it in the language server.

@roblourens
Copy link
Member

Sure, here: https://github.com/Microsoft/tolerant-php-parser/tree/iteratorPerfTest

Run with php -d memory_limit=500M validation/iteratorPerf1.php for 1, 2, 3. This is very quick and dirty and it's possible that I'm doing something dumb.

@felixfbecker
Copy link
Author

felixfbecker commented Jun 13, 2017

I tried everything I could to optimize, but the Generator API is twice as fast. But it should also be mentioned that "reimplementing" the traversal inline instead of using the the Generator API is 4x as fast, so the more ergonomic the API, the worse the performance.

✔ ~/git/tolerant-php-parser [iterator ↑·2|✚ 3…1]
23:18 $ php validation/iteratorPerf3.php
0102030405060708090100
Total nodes: 403197
MACHINE INFO
============
PHP int size: 8
PHP version: 7.1.0
OS: Darwin Felixs-MacBook-Pro.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

PERF STATS
==========
Input Source Files (#): 101
Input Source Size (MB): 10.603403091431

Time Usage (seconds): 0.62322616577148
Memory Usage (MB): 0
✔ ~/git/tolerant-php-parser [iterator ↑·2|✚ 3…1]
23:18 $ php validation/iteratorPerf2.php
0102030405060708090100
Total nodes and tokens: 403197
MACHINE INFO
============
PHP int size: 8
PHP version: 7.1.0
OS: Darwin Felixs-MacBook-Pro.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

PERF STATS
==========
Input Source Files (#): 101
Input Source Size (MB): 10.603403091431

Time Usage (seconds): 0.30183696746826
Memory Usage (MB): 0
✔ ~/git/tolerant-php-parser [iterator ↑·2|✚ 3…1]
23:19 $ php validation/iteratorPerf1.php
0102030405060708090100
Total nodes: 376330
MACHINE INFO
============
PHP int size: 8
PHP version: 7.1.0
OS: Darwin Felixs-MacBook-Pro.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

PERF STATS
==========
Input Source Files (#): 101
Input Source Size (MB): 10.603403091431

Time Usage (seconds): 0.14891004562378
Memory Usage (MB): 0
✔ ~/git/tolerant-php-parser [iterator|✚ 2…1]
23:31 $

@felixfbecker
Copy link
Author

I think we should put it in perspective and benchmark real application code. I.e. benchmark the PHP language server indexing a project with one API vs the other. It may be twice as fast, but if the bottleneck is elsewhere and one API or the other is only indexing time of e.g. 10s vs 11s than I would still prefer the more advanced API if it makes the LS code more readable and DRY.

@felixfbecker
Copy link
Author

It also depends on the use case. Some users might just want to write a little script for a single large automatic code refactor and just need to write it quickly and then execute it once. We could document that there are essentially three ways to traverse ASTs that have trade-offs in performance vs API ergonomics. Users can then pick what they want.

@roblourens
Copy link
Member

I agree that we should try it out in a real-world situation. It may be that while visiting every node is slower, it's easier to traverse more efficiently or something like that. I'll revisit this once I'm through the current outstanding issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement RecursiveIterator
5 participants