Skip to content

Root AST #5137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 16, 2019
Merged

Root AST #5137

merged 13 commits into from
Jan 16, 2019

Conversation

helixbass
Copy link
Collaborator

@GeoffreyBooth PR for root Block AST

Based on jsx-element-ast, here is the diff against that branch

src/nodes.coffee Outdated
rootToAst: ->
programLocationData = @astLocationData()

programAst = Object.assign
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The root of a Babel-style AST has a root File node with a Program child node

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a Block having two “modes,” root and body, should we perhaps split this class up into a base and children? Like BlockBase as a parent for Root and Block?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if file.program should be handled outside of the node classes, since that’s more like part of the file itself than part of the AST? We could return a root with expressions like we were before, or perhaps rename expressions to body in the AST, and then the parent file.program could be added outside of this like in coffeescript.coffee or somewhere else.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a Block having two “modes,” root and body, should we perhaps split this class up into a base and children? Like BlockBase as a parent for Root and Block?

Ok introduced a new Root class, but made it a parent of the root Block rather than a subclass variation of it

I also wonder if file.program should be handled outside of the node classes, since that’s more like part of the file itself than part of the AST?

Handling the root File AST node outside of the node classes makes a certain amount of sense conceptually but in practice it seems like it should stay in the node classes since (a) that File node needs to have comments attached to it and (b) it seems weird not to be able to use our AST methods to generate that root AST node

So I think this makes the most sense - now we can actually use the standard AST methods for both the File AST node (which corresponds to the Root node class) and the Program AST node (which corresponds to the root block, ie the body property of the Root node class instance), so from an AST-generation perspective it's a significant cleanup

This also cleans up the root case when compiling JS a bit - now Root::compileNode() does most of what Block::compileRoot() did previously, and Block::compileToFragments() is gone (since Root::compileNode() can directly call the still-special Block::compileRoot() on its @body since it already knows that it's the root block)

The interesting thing I ran into was that producing/expecting this new Root node was breaking the existing interfaces of CoffeeScript.nodes() and the corresponding call to .compile() (on the root node returned by CoffeeScript.nodes())

For .compile(), I added a backwards-compatibility override Block::compile() which checks if it needs to wrap itself in a new Root. So that way any existing code that expects to be able to call .compile() on a "root" Block instance will still work

For CoffeeScript.nodes() it was slightly more interesting. We could just return the new Root instance (as the root of the node class instance tree), and then clean up internal references (eg in repl.coffee, a few tests) that expect the return value of CoffeeScript.nodes() to be a Block instance

But then any third-party consumers of the CoffeeScript.nodes() API that expect it to return a Block would break. So what I did was return the "root" Block instance (ie the body of the Root instance) from CoffeeScript.nodes(). This is a little weird in that it makes the Root instance impossible to obtain a reference to via CoffeeScript.nodes(), but I think it's preferable to introducing breakage for consumers of CoffeeScript.nodes()?

src/nodes.coffee Outdated
type: 'Program'
sourceType: 'module'
body: @bodyToAst()
directives: []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, don't look for directives (eg 'use strict')

src/nodes.coffee Outdated
Object.assign
type: 'File'
program: programAst
comments: []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, don't worry about comments

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are going to be a nightmare.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are and they aren't:

  • We only need to provide a list of all comments here on the root File node (not individual comments attached to their corresponding AST nodes), so we don't have to worry about that
  • But there were a fair number of little things that needed tweaking to get comments to behave correctly/as expected for Prettier and ESLint (eg location data, more awareness of comments when placing generated tokens)

I'm planning on comments being the last big thing we cover in these PRs (after we've gone through all the node classes)

src/nodes.coffee Outdated
programLocationData

ast: ->
@rootToAst()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now act as though all Blocks are root blocks. Will introduce nested blocks next

src/nodes.coffee Outdated
rootToAst: ->
programLocationData = @astLocationData()

programAst = Object.assign
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a Block having two “modes,” root and body, should we perhaps split this class up into a base and children? Like BlockBase as a parent for Root and Block?

Copy link
Collaborator Author

@helixbass helixbass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth updated to add Root node class, see comments

src/nodes.coffee Outdated
rootToAst: ->
programLocationData = @astLocationData()

programAst = Object.assign
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a Block having two “modes,” root and body, should we perhaps split this class up into a base and children? Like BlockBase as a parent for Root and Block?

Ok introduced a new Root class, but made it a parent of the root Block rather than a subclass variation of it

I also wonder if file.program should be handled outside of the node classes, since that’s more like part of the file itself than part of the AST?

Handling the root File AST node outside of the node classes makes a certain amount of sense conceptually but in practice it seems like it should stay in the node classes since (a) that File node needs to have comments attached to it and (b) it seems weird not to be able to use our AST methods to generate that root AST node

So I think this makes the most sense - now we can actually use the standard AST methods for both the File AST node (which corresponds to the Root node class) and the Program AST node (which corresponds to the root block, ie the body property of the Root node class instance), so from an AST-generation perspective it's a significant cleanup

This also cleans up the root case when compiling JS a bit - now Root::compileNode() does most of what Block::compileRoot() did previously, and Block::compileToFragments() is gone (since Root::compileNode() can directly call the still-special Block::compileRoot() on its @body since it already knows that it's the root block)

The interesting thing I ran into was that producing/expecting this new Root node was breaking the existing interfaces of CoffeeScript.nodes() and the corresponding call to .compile() (on the root node returned by CoffeeScript.nodes())

For .compile(), I added a backwards-compatibility override Block::compile() which checks if it needs to wrap itself in a new Root. So that way any existing code that expects to be able to call .compile() on a "root" Block instance will still work

For CoffeeScript.nodes() it was slightly more interesting. We could just return the new Root instance (as the root of the node class instance tree), and then clean up internal references (eg in repl.coffee, a few tests) that expect the return value of CoffeeScript.nodes() to be a Block instance

But then any third-party consumers of the CoffeeScript.nodes() API that expect it to return a Block would break. So what I did was return the "root" Block instance (ie the body of the Root instance) from CoffeeScript.nodes(). This is a little weird in that it makes the Root instance impossible to obtain a reference to via CoffeeScript.nodes(), but I think it's preferable to introducing breakage for consumers of CoffeeScript.nodes()?

src/nodes.coffee Outdated
Object.assign
type: 'File'
program: programAst
comments: []
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are and they aren't:

  • We only need to provide a list of all comments here on the root File node (not individual comments attached to their corresponding AST nodes), so we don't have to worry about that
  • But there were a fair number of little things that needed tweaking to get comments to behave correctly/as expected for Prettier and ESLint (eg location data, more awareness of comments when placing generated tokens)

I'm planning on comments being the last big thing we cover in these PRs (after we've gone through all the node classes)

@GeoffreyBooth
Copy link
Collaborator

GeoffreyBooth commented Nov 30, 2018

verify my assumption that it’ll be easy to “monkeypatch” it back in as needed inside the ESLint plugin, since there are places there that expect it to be present

I would think you could just add it as a property in your middleware plugin, like (psuedocode):

ast = CoffeeScript.compile(source, ast: yes)
ast.program.sourceType = 'module'
# use `ast` with ESLint

Copy link
Collaborator Author

@helixbass helixbass left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeoffreyBooth updated per your comments

I would think you could just add it as a property in your middleware plugin

Yup the ESLint plugin already has a transformation step (to go from Babel-style AST to ESLint/espree-style AST, like what babel-eslint does), so this would fit naturally there

… location data, rather than extracting it from the whole AST object; move all the logic into one function, rather than spreading it out across several functions on the Block class that all appear to be internal
@GeoffreyBooth
Copy link
Collaborator

GeoffreyBooth commented Dec 3, 2018

@helixbass I pushed some changes, please review in particular 3677d11

I refactored Block.astProperties to use expression.astLocationData() to get location data, rather than extracting the location data from the whole AST object; this felt more object-oriented and logical to me. I also moved the logic for generating astProperties into that method, rather than spreading it out across several methods on the Block class that all appear to be internal (as in it wouldn’t make sense to call those methods from outside of Block, they’re really just helper functions for astProperties). All the tests still pass, please let me know what you think.

One other thing is that we’re not defining Block.astLocationData, but it appears we don’t need to, the version it inherits from Base seems to work fine. We’re not testing the location data for the Program node, though; not sure if we need to.

@helixbass
Copy link
Collaborator Author

@GeoffreyBooth the changes you made look fine including the refactor

You're right there currently aren't AST location data tests for the root File/Program. I just was messing with this and ran into some issues related to whether the location data of a trailing TERMINATOR (generated or not) gets include in the File/Program AST location data. I think this deserves a little deeper examination so I'd like to treat this as a separate location-data-related TODO

@GeoffreyBooth
Copy link
Collaborator

I noticed exactly the same thing, about a newline at the end of the file, when I was comparing our AST for something trivial like new Date() with the AST from astexplorer.net. Might be worth nailing down which way it should be and adding a test for that.

@helixbass
Copy link
Collaborator Author

@GeoffreyBooth ok I pushed a commit fixing some of the behavior related to root location data and with a commented-out TODO test for a part that still isn't behaving correctly

The thing that's still not behaving correctly is the case when there's an actual trailing newline in the source file. From what I can tell the root of the issue is that in outdentToken() in lexer.coffee (called by closeIndentation()) it always creates the TERMINATOR token with length: 0 (regardless of whether it corresponds to an actual newline and thus should be length: 1)

But I don't feel confident trying to apply an immediate fix there, so I left a commented-out AST location data test case with an actual trailing newline which should pass once we apply a fix such that the location data of the trailing TERMINATOR token is always correct

@GeoffreyBooth
Copy link
Collaborator

Which parser are you treating as your target in astexplorer.net? babylon7?

I’m thinking about the case where the code ends with multiple newlines, or a comment:

a = 1


# copyright 2018

In astexplorer, everything all the way to the last character in the comment is included in the root location end. I’m thinking that we may need to handle this as a special case, either in coffeescript.coffee or by noting the original source length from the lexer and passing it as extra data to the Root class.

@helixbass
Copy link
Collaborator Author

Which parser are you treating as your target in astexplorer.net? babylon7?

@GeoffreyBooth yes typically babylon7

I’m thinking about the case where the code ends with multiple newlines, or a comment

This is a good special case to check if it handles it correctly. We should wait until we've gotten through comments to address it though, since there will be similar including-comment-locations-in-Block-location-data fixes included there, which would probably affect the handling of this particular special case (where the "root block" location data should include a trailing comment)

@helixbass
Copy link
Collaborator Author

@GeoffreyBooth was there something else you wanted to see addressed in this PR?

@GeoffreyBooth
Copy link
Collaborator

GeoffreyBooth commented Jan 15, 2019

@helixbass Please take a look at helixbass@a85b987, I think I’ve fixed the location data end values for the root/Program»File node. I think the issue was due to the clean function in the lexer, which trims extraneous whitespace before the lexing begins.

@helixbass
Copy link
Collaborator Author

@GeoffreyBooth ok the approach you took makes sense to me (handling the root File/Program node location data as a special case where we just reference the original complete source code string directly rather than try and account for destructive operations eg the lexer's clean()), merging

@helixbass
Copy link
Collaborator Author

@GeoffreyBooth oh wait I can't merge into ast, so looks good for you to merge 👍

@GeoffreyBooth GeoffreyBooth merged commit 38c8b2f into jashkenas:ast Jan 16, 2019
@GeoffreyBooth
Copy link
Collaborator

Awesome, thanks 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants