Add NodeIterator #139

felixfbecker · 2017-06-09T21:53:01Z

Closes #130

jens1o · 2017-06-09T21:54:32Z

docs/Traversal.md

+```php
+// Find all nodes in the current scope
+$nodesInScopeReIt = new \RecursiveCallbackFilterIterator($node, function ($current, $key, Iterator $iterator) {
+    // Don't traverse into function nodes, they form a differnt scope


jens1o

lgtm overall in this early state :)

roblourens · 2017-06-10T01:19:00Z

I like it but I'm curious what the perf impact is compared to a generator or a simple loop

felixfbecker · 2017-06-10T09:18:27Z

Yes, would be worth to benchmark. My gut is telling me that it is faster than generators, cause they have been slow in PHP historically. The implementation maybe could also be optimised a bit more at the cost of uglier code (array index instead of ArrayIterator could be faster).

felixfbecker · 2017-06-10T18:15:29Z

Fixed it, but the test is not complete yet. Any suggestions how the implementation could be optimized appreciated. Would also like to add an Iterator that walks upwards (following parent), but don't know a good name.

jens1o · 2017-06-10T18:49:06Z

tests/api/NodeIteratorTest.php

-}
-foo();
-PHP;
+    const FILE_CONTENTS = '


use nowdoc instead?

what's the advantage?

roblourens · 2017-06-12T20:46:02Z

I wrote a very basic perf test that parses and traverses the full syntax trees of every file in the Wordpress project. In that scenario, the Iterator solution is much slower.

A simple recursive function and loop over CHILD_NAMES: ~.88s
Node's getDescendantNodesAndTokens generator-based helper: ~1.03s
RecursiveIteratorIterator: ~1.88s

There seems to be a lot going on for each iteration. I hacked NodeIterator to replace the ChildNamesIterator with an index into the array, and I see it around 1.72s, so there's probably more optimization that can be done. I don't know whether I'd want to use it in the language server quite yet or call it the recommended way to traverse the tree. What do you think? It's too bad because perf aside, I would prefer this style.

jens1o · 2017-06-12T20:46:53Z

Oh, the times are pretty poor....

roblourens · 2017-06-12T18:42:13Z

src/Node.php

+     * @return NodeIterator
+     */
+    public function getIterator() {
+        return new NodeIterator($this);


Should this be Iterator\NodeIterator?

woops, yes. didn't change it when I moved it to the namespace.

roblourens · 2017-06-12T20:52:57Z

Those times include parsing so the time diff spent iterating is more significant.

felixfbecker · 2017-06-12T20:53:08Z

Too bad! Can you share the benchmark code? I will try to squeeze more performance out of it (that was my suspicion anyway). I prefer this style too because the returned Iterator maintains the semantics of being nested, not flat, but if it's that slower I wouldn't use it in the language server.

roblourens · 2017-06-12T20:58:44Z

Sure, here: https://github.com/Microsoft/tolerant-php-parser/tree/iteratorPerfTest

Run with php -d memory_limit=500M validation/iteratorPerf1.php for 1, 2, 3. This is very quick and dirty and it's possible that I'm doing something dumb.

felixfbecker · 2017-06-13T21:37:06Z

I tried everything I could to optimize, but the Generator API is twice as fast. But it should also be mentioned that "reimplementing" the traversal inline instead of using the the Generator API is 4x as fast, so the more ergonomic the API, the worse the performance.

✔ ~/git/tolerant-php-parser [iterator ↑·2|✚ 3…1]
23:18 $ php validation/iteratorPerf3.php
0102030405060708090100
Total nodes: 403197
MACHINE INFO
============
PHP int size: 8
PHP version: 7.1.0
OS: Darwin Felixs-MacBook-Pro.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

PERF STATS
==========
Input Source Files (#): 101
Input Source Size (MB): 10.603403091431

Time Usage (seconds): 0.62322616577148
Memory Usage (MB): 0
✔ ~/git/tolerant-php-parser [iterator ↑·2|✚ 3…1]
23:18 $ php validation/iteratorPerf2.php
0102030405060708090100
Total nodes and tokens: 403197
MACHINE INFO
============
PHP int size: 8
PHP version: 7.1.0
OS: Darwin Felixs-MacBook-Pro.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

PERF STATS
==========
Input Source Files (#): 101
Input Source Size (MB): 10.603403091431

Time Usage (seconds): 0.30183696746826
Memory Usage (MB): 0
✔ ~/git/tolerant-php-parser [iterator ↑·2|✚ 3…1]
23:19 $ php validation/iteratorPerf1.php
0102030405060708090100
Total nodes: 376330
MACHINE INFO
============
PHP int size: 8
PHP version: 7.1.0
OS: Darwin Felixs-MacBook-Pro.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

PERF STATS
==========
Input Source Files (#): 101
Input Source Size (MB): 10.603403091431

Time Usage (seconds): 0.14891004562378
Memory Usage (MB): 0
✔ ~/git/tolerant-php-parser [iterator|✚ 2…1]
23:31 $

felixfbecker · 2017-06-14T12:26:13Z

I think we should put it in perspective and benchmark real application code. I.e. benchmark the PHP language server indexing a project with one API vs the other. It may be twice as fast, but if the bottleneck is elsewhere and one API or the other is only indexing time of e.g. 10s vs 11s than I would still prefer the more advanced API if it makes the LS code more readable and DRY.

felixfbecker · 2017-06-14T12:32:35Z

It also depends on the use case. Some users might just want to write a little script for a single large automatic code refactor and just need to write it quickly and then execute it once. We could document that there are essentially three ways to traverse ASTs that have trade-offs in performance vs API ergonomics. Users can then pick what they want.

roblourens · 2017-06-15T23:12:29Z

I agree that we should try it out in a real-world situation. It may be that while visiting every node is slower, it's easier to traverse more efficiently or something like that. I'll revisit this once I'm through the current outstanding issues.

Add NodeIterator

99fd539

msftclas added the cla-already-signed label Jun 9, 2017

jens1o reviewed Jun 9, 2017

View reviewed changes

jens1o approved these changes Jun 9, 2017

View reviewed changes

Fixes

79e87c8

Fix InlineHtml order

dca82af

jens1o reviewed Jun 10, 2017

View reviewed changes

felixfbecker added 5 commits June 10, 2017 21:15

Fix typos

0f82f05

Complete test

ba494d4

Improve docs

aa0a707

Polish

fd6358f

Add AncestorIterator

1cb1356

felixfbecker changed the title ~~WIP: Add NodeIterator~~ Add NodeIterator Jun 10, 2017

felixfbecker added 2 commits June 10, 2017 22:56

Add docs for NodeAncestorIterator

d63e978

Fix NodeAncestorIteratorTest

d42d70b

Add perf test scripts

6379ba9

roblourens reviewed Jun 12, 2017

View reviewed changes

Fix NodeIterator namespace reference

23b9868

felixfbecker added 3 commits June 13, 2017 19:14

Don't include parsing in benchmarks

348abb9

Make faster

4ee5b75

Can't help it

035e87e

msftgits removed the cla-already-signed label Sep 26, 2017

microsoft deleted a comment from msftclas Sep 27, 2017

felixfbecker mentioned this pull request Mar 1, 2018

Refactor / performance: Track scope information while traversing felixfbecker/php-language-server#609

Open

roblourens closed this Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NodeIterator #139

Add NodeIterator #139

felixfbecker commented Jun 9, 2017 •

edited

Loading

jens1o Jun 9, 2017

jens1o left a comment •

edited

Loading

roblourens commented Jun 10, 2017

felixfbecker commented Jun 10, 2017

felixfbecker commented Jun 10, 2017 •

edited

Loading

jens1o Jun 10, 2017

felixfbecker Jun 10, 2017

roblourens commented Jun 12, 2017

jens1o commented Jun 12, 2017

roblourens Jun 12, 2017

felixfbecker Jun 12, 2017

roblourens commented Jun 12, 2017

felixfbecker commented Jun 12, 2017

roblourens commented Jun 12, 2017

felixfbecker commented Jun 13, 2017 •

edited

Loading

felixfbecker commented Jun 14, 2017

felixfbecker commented Jun 14, 2017

roblourens commented Jun 15, 2017

Add NodeIterator #139

Add NodeIterator #139

Conversation

felixfbecker commented Jun 9, 2017 • edited Loading

jens1o Jun 9, 2017

Choose a reason for hiding this comment

jens1o left a comment • edited Loading

Choose a reason for hiding this comment

roblourens commented Jun 10, 2017

felixfbecker commented Jun 10, 2017

felixfbecker commented Jun 10, 2017 • edited Loading

jens1o Jun 10, 2017

Choose a reason for hiding this comment

felixfbecker Jun 10, 2017

Choose a reason for hiding this comment

roblourens commented Jun 12, 2017

jens1o commented Jun 12, 2017

roblourens Jun 12, 2017

Choose a reason for hiding this comment

felixfbecker Jun 12, 2017

Choose a reason for hiding this comment

roblourens commented Jun 12, 2017

felixfbecker commented Jun 12, 2017

roblourens commented Jun 12, 2017

felixfbecker commented Jun 13, 2017 • edited Loading

felixfbecker commented Jun 14, 2017

felixfbecker commented Jun 14, 2017

roblourens commented Jun 15, 2017

felixfbecker commented Jun 9, 2017 •

edited

Loading

jens1o left a comment •

edited

Loading

felixfbecker commented Jun 10, 2017 •

edited

Loading

felixfbecker commented Jun 13, 2017 •

edited

Loading