Skip to content

Support for formatting a selection #708

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 14, 2024
Merged

Support for formatting a selection #708

merged 3 commits into from
Jun 14, 2024

Conversation

DaveEwing
Copy link
Contributor

@DaveEwing DaveEwing commented Mar 22, 2024

Work in progress at the moment. I have a handful of tweaks I want to make, but it appears to work pretty well overall.

Copy link
Member

@allevato allevato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great! A few high-level comments to start that are mainly API/UX things.

Can you plumb through a test or two that shows the whitespace issues you're seeing? That would make it easier to help debug what's going on.

A JSON string containing an array of {"offset":<Int>, "length":<Int>} pairs specifying \
the source code ranges to format.
""")
var selection: String?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might just be for testing, but I think we can improve the user interface a little bit here; how about something like --lines <start line>:<end line> for line-based ranges and --offsets <start offset>:<end offset> for offset-based ranges? (I don't think "length" is a particularly good way to express this on the command line.) Those flags are a bit closer to what clang-format provides for its command line, and we could accept multiple occurrences of it.

To deal with parsing the s:e format, you can use swift-argument-parser's ExpressibleByArgument protocol instead of JSON decoding. Maybe the option type is just a pair of ints that implements ExpressibleByArgument and then the validate method constructs the Selection from any --lines and --offsets that it's given.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of supporting --lines <start line>:<end line> as clang-format does. This seems much easier for specifying ranges at the command line. The one issue there is having to convert to AbsolutePositions, since that's what we have with a Syntax struct. But we could do that conversion at the start. (I'm already worried about performance of all the range comparisons that happen. We don't want to make that harder.) Anyhow, maybe this can be a second PR?

I was modeling the offset/length bit (kinda) off what clang-format does. It can take multiple --offset/--length pairs. (Would swift-argument-parser support that? I assumed not, but I admit that I never looked.) I ended up using JSON decoding because it was easy. 🤷‍♂️ clang-format doesn't support --offsets <start offset>:<end offset>, so there's no precedent there. My expectation is that a human wouldn't ever specify ranges with offsets (or offsets/lengths). It's just not convenient. But it is convenient for an editor shelling out to swift-format.

If I do manage to change Selection.Range to be a typealias to Range<AbsolutePosition>, then it might be a bit more convenient to pass two offsets, rather than offset/length. Do you have a strong opinion as to whether I should bother with this refactoring? I made a start down that road, and it was a kinda annoying set of changes. 🤣

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My own personal opinion is that ranges are a bit more natural than an offset/length pair. I'm not sure what motivated clang-format's --offset and --length flags unless it was some very specific workflow that already had the information in that form, but I don't think ordered pairs of --offset X --length Y ... would work well in swift-argument-parser or for human users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I would also prefer --lines <start line>:<end line>. And if we do the conversion up front with a SourceLocationConverter, I don’t think it should be too expensive. SourceLocationConverter needs to build up a line table but I would be surprised if that’s the same order of magnitude as formatting (would be nice to confirm though).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SourceLocationConverter is already being created in other code paths anyway, so using it to map the line numbers to offsets or vice versa shouldn't create additional overhead; we just need to make sure we're using the same one and not creating a separate one just for that purpose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like --lines <start line>:<end line> as well. For an actual user, this is way more useful than offsets. But offsets works well when integrating with other tools. (It lets you, for instance, reformat spacing within a single line without affecting anything else. That's something you might want to do from within an editor.) Chatting with Tony on Friday we also agreed that using --offsets <start offset>:<end offset> is cleaner. I'll make that change in this PR. Supporting --lines <start line>:<end line> can come in a followup PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may add the --lines <start line>:<end line> support in this PR... at least for the testing support. Right now I've disabled the idempotency testing for non-infinite selection tests. If we support specifying lines, then we can (usually) check for idempotency.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DaveEwing Alex pointed me to this conversion because I'm trying to implement git swift-format, we do need --lines for that to work because that's the only easily extractable information from git diff-index.

@DaveEwing
Copy link
Contributor Author

This is looking really great! A few high-level comments to start that are mainly API/UX things.

Can you plumb through a test or two that shows the whitespace issues you're seeing? That would make it easier to help debug what's going on.

I've got testing support plumbed through, but I fixed the whitespace issues I was seeing.

@DaveEwing
Copy link
Contributor Author

A note on performance.

To be honest, I was rather worried about the performance of this approach. So, I did a couple of tests on a release build of swift-format using TokenStreamCreator.swift from the project (which is 4268 lines long). I used the --measure-instructions option to show how many instructions are executed when formatting the file. I compared this branch to main.

The tests are:

  • formatting the whole file (no selection)
  • formatting a selection of the whole file
  • formatting 200 lines of the file (starting at line 2000)

Here are the results:

No selection Whole file selected Selection lines 2000-2200
Before 4073650908
After 4166960035 4230133360 1552009297

In all cases, formatting finished in well under a second in real time. (I didn't precisely meassure this.)

Formatting the whole file without a selection is about 2% slower. Formatting the whole file with a selection is less than 4% slower. Formatting a small range is significantly faster. That shouldn't really be a surprise, but it is nice to confirm!

Comment on lines 24 to 26
/// If the marked text contains "➡️" and "⬅️", they're used to create a selection
public var selection: Selection?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all the markers are represented as Int that signify UTF-8 offsets, I think it would make sense to also store the ranges as Range<Int>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now using Range<Int> as they're built up, but create a Selection in the initializer. I may want to revisit this (and not have a selection property) if we support line ranges.

@DaveEwing
Copy link
Contributor Author

Pushed changes that respond to most of the comments from @allevato and @ahoppen. (I need to take another pass through the comments to see what I've missed!) I think the changes really are improvements, so thanks you two! I also need to do more testing locally... probably adding more tests - and getting the one disabled test working.

@DaveEwing DaveEwing force-pushed the dewing/FormatRanges branch 2 times, most recently from 4558882 to a1b10de Compare May 28, 2024 20:07
@DaveEwing DaveEwing marked this pull request as ready for review May 28, 2024 20:13
The basic idea here is to insert `enableFormatting` and `disableFormatting` tokens into the print stream when we enter or leave the selection. When formatting is enabled, we print out the tokens as usual. When formatting is disabled, we turn off any output until the next `enableFormatting` token. When that token is hit, we write the original source text from the location of the last `disableFormatting` to the current location.

Note that this means that all the APIs need the original source text to be passed in.

A `Selection` is represented as an enum with an `.infinite` case, and a `.ranges` case to indicate either selecting the entire file, or an array of start/end utf-8 offsets. The offset pairs are given with `Range<AbsolutePosition>`, matching the (now common) usage in swift-syntax.

For testing, allow marked text to use `⏩` and `⏪` to deliniate the start/end of a range of a selection.

The command line now takes an `--offsets` option of comma-separated "start:end" pairs to set the selection for formatting.
@ahoppen ahoppen force-pushed the dewing/FormatRanges branch from a1b10de to 6640067 Compare May 29, 2024 16:27
Comment on lines 600 to 613
if let nonWhitespace = text.rangeOfCharacter(
from: CharacterSet.whitespaces.inverted, options: .backwards) {
text = String(text[..<nonWhitespace.upperBound])
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m still missing how the whitespace characters that we’re shaving off with this are being printed.

If you have the following

func foo(){}

Then I think this only adds func foo() to the output stream, right (no whitespace)? But the next token that we print is the {, which doesn’t have any leading trivia (spaces are trailing trivia to )).

If those examples work because of some trivia re-attribution rule, how about the following?

func foo() /**/  ⏩{}
func foo() /**/
  ⏩{}

@DaveEwing DaveEwing force-pushed the dewing/FormatRanges branch from 7e7d17c to 0c0977d Compare June 4, 2024 15:11
@ahoppen
Copy link
Member

ahoppen commented Jun 5, 2024

Looking through my comments again, I think there are 3 open comments (noting them here as much for my own sake as everybody else’s)

@DaveEwing
Copy link
Contributor Author

DaveEwing commented Jun 5, 2024

github won't let me comment on I’m still missing how the whitespace characters that we’re shaving off with this are being printed, so I'll do it here. As I alluded to here, we strip off trailing whitespace when we copy the original code so that we can let the normal mechanisms do all the whitespace generation magic instead. This is actually pretty simple. It's a matter of setting just a couple of state variables. This is PrettyPrint.swift, lines 610-617.

@DaveEwing
Copy link
Contributor Author

And one additional comment. Currently all the rules that do "non-whitespace" formatting are disabled when there's a selection. For instance, formatting won't remove semicolons when there's a selection. I think that's fine for a first pass, but I want to get that working too.

My intention, and what the code is currently trying to do but is failing, was to only enable processing of nodes that are fully-contained inside individual ranges. (By "failing", I mean that it just ends up disabling all rules.) Instead, I'm now looking at enabling the rules all the time, and using the existing mechanism to turn printing on/off. That's not working yet because the source ranges of the rewritten nodes no longer match the original source. I've good some work to do to figure out how to get it right....

@ahoppen
Copy link
Member

ahoppen commented Jun 5, 2024

github won't let me comment on I’m still missing how the whitespace characters that we’re shaving off with this are being printed, so I'll do it here. As I alluded to here, we strip off trailing whitespace when we copy the original code so that we can let the normal mechanisms do all the whitespace generation magic instead. This is actually pretty simple. It's a matter of setting just a couple of state variables. This is PrettyPrint.swift, lines 610-617.

OK, I think I understand what was throwing me off now

I thought that the space and tab trivia from the syntax tree were also added to the token stream but they are not.

Also, you assume that CharacterSet.whitespace is equivalent to the .spaces, and .tabs, .formfeeds and .verticalTabs cases in Trivia right? But that’s not generally true. Two examples:

  • NO-BREAK SPACE is considered as whitespace in CharacterSet.whitespace but it’s parsed as unexpected text in swift-syntax (same for other characters from Unicode General Category Zs)
  • If you have a line comment with a trailing space, we consider that space as part of the comment and not as separate whitespace in the trivia.
    Because of this, when the formatted selection ends after a line comment with a trailing whitespace, you would remove it while whole file formatting would not trim it, I think. Similarly, if the formatted selection ends after a non-breaking space, you would remove it while whole file formatting would not.

I think we have two solutions here:

  1. Inspect the syntax tree to find characters that swift-syntax has classified as spaces or tabs
  2. Accept that there is a discrepancy between whole-file formatting and range-based formatting here. If we do that, I think we should
    a. Add a comment explaining the trade-off being made
    b. Only trim space, tab, formfeed and vertical tab instead of all characters in CharacterSet.whitespaces to align this to swift-syntax as closely as possible.

@DaveEwing
Copy link
Contributor Author

DaveEwing commented Jun 11, 2024

Also, you assume that CharacterSet.whitespace is equivalent to the .spaces, and .tabs, .formfeeds and .verticalTabs cases in Trivia right? But that’s not generally true. Two examples:

NO-BREAK SPACE is considered as whitespace in CharacterSet.whitespace but it’s parsed as unexpected text in swift-syntax (same for other characters from Unicode General Category Zs)
If you have a line comment with a trailing space, we consider that space as part of the comment and not as separate whitespace in the trivia.
Because of this, when the formatted selection ends after a line comment with a trailing whitespace, you would remove it while whole file formatting would not trim it, I think. Similarly, if the formatted selection ends after a non-breaking space, you would remove it while whole file formatting would not.
I think we have two solutions here:

Inspect the syntax tree to find characters that swift-syntax has classified as spaces or tabs
Accept that there is a discrepancy between whole-file formatting and range-based formatting here. If we do that, I think we should
a. Add a comment explaining the trade-off being made
b. Only trim space, tab, formfeed and vertical tab instead of all characters in CharacterSet.whitespaces to align this to swift-syntax as closely as possible.

Once again, github doesn't let me comment directly. Weird.

In any case, adding a non-breaking space to your source causes swift-format to fail completely. It just outputs an empty file. This is true even before the changes in this PR. I haven't bothered looking into what it's doing.

I've tried cases where a comment ends with a space, or when block comments have a trailing space. I haven't found any differences between the behavior with or without a selection. The spaces are always removed. (And the behavior matches the behavior before the changes in this PR.)

While there might be some edge cases here, they don't seem important to me.

Copy link
Member

@ahoppen ahoppen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree I am very pedantic here. A lot of it was me trying to understand how things fit together to see if there are any more significant issues lurking somewhere. But seems like there aren’t. Let’s get this merged and we can always fix it later if it’s causing any issues in practice (which I agree would probably be very niece anyway).

@allevato
Copy link
Member

Currently all the rules that do "non-whitespace" formatting are disabled when there's a selection. For instance, formatting won't remove semicolons when there's a selection. I think that's fine for a first pass, but I want to get that working too.

@DaveEwing Were you still working on this, or did you want to land this change and then come back to it? I'm fine either way (since it's not immediately obvious how deep the change will be make that work).

I'll go ahead and kick off CI again, and if it comes back clean and you're happy with the current state of the PR, we can get it merged.

Copy link
Member

@allevato allevato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is green, so let me know if you'd like this merged now!

@DaveEwing
Copy link
Contributor Author

DaveEwing commented Jun 14, 2024

Currently all the rules that do "non-whitespace" formatting are disabled when there's a selection. For instance, formatting won't remove semicolons when there's a selection. I think that's fine for a first pass, but I want to get that working too.

@DaveEwing Were you still working on this, or did you want to land this change and then come back to it? I'm fine either way (since it's not immediately obvious how deep the change will be make that work).

Part of me would like to clean this up first (maybe not fix it, but just make it more obvious in the code, and to users, that the rules are disabled). But I'd gladly do that in a second PR.

If you're up for it @allevato, let's merge, and I'll continue with polish over the next few weeks/months.

@allevato
Copy link
Member

I'll go ahead and merge then. Since this feature is only on main, we have time to iterate on it. Thanks for working on this—it's something I've wanted for a long time!

@allevato allevato merged commit 53279ee into main Jun 14, 2024
@allevato allevato deleted the dewing/FormatRanges branch June 14, 2024 11:50
ahoppen pushed a commit to ahoppen/swift-format that referenced this pull request Jun 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SR-15880] Add ability to swift-format to only re-format certain ranges of a file
4 participants