-
Notifications
You must be signed in to change notification settings - Fork 247
Support for formatting a selection #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really great! A few high-level comments to start that are mainly API/UX things.
Can you plumb through a test or two that shows the whitespace issues you're seeing? That would make it easier to help debug what's going on.
A JSON string containing an array of {"offset":<Int>, "length":<Int>} pairs specifying \ | ||
the source code ranges to format. | ||
""") | ||
var selection: String? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might just be for testing, but I think we can improve the user interface a little bit here; how about something like --lines <start line>:<end line>
for line-based ranges and --offsets <start offset>:<end offset>
for offset-based ranges? (I don't think "length" is a particularly good way to express this on the command line.) Those flags are a bit closer to what clang-format provides for its command line, and we could accept multiple occurrences of it.
To deal with parsing the s:e
format, you can use swift-argument-parser's ExpressibleByArgument
protocol instead of JSON decoding. Maybe the option type is just a pair of ints that implements ExpressibleByArgument
and then the validate
method constructs the Selection
from any --lines
and --offsets
that it's given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of supporting --lines <start line>:<end line>
as clang-format does. This seems much easier for specifying ranges at the command line. The one issue there is having to convert to AbsolutePosition
s, since that's what we have with a Syntax
struct. But we could do that conversion at the start. (I'm already worried about performance of all the range comparisons that happen. We don't want to make that harder.) Anyhow, maybe this can be a second PR?
I was modeling the offset/length bit (kinda) off what clang-format does. It can take multiple --offset
/--length
pairs. (Would swift-argument-parser support that? I assumed not, but I admit that I never looked.) I ended up using JSON decoding because it was easy. 🤷♂️ clang-format doesn't support --offsets <start offset>:<end offset>
, so there's no precedent there. My expectation is that a human wouldn't ever specify ranges with offsets (or offsets/lengths). It's just not convenient. But it is convenient for an editor shelling out to swift-format.
If I do manage to change Selection.Range
to be a typealias
to Range<AbsolutePosition>
, then it might be a bit more convenient to pass two offsets, rather than offset/length. Do you have a strong opinion as to whether I should bother with this refactoring? I made a start down that road, and it was a kinda annoying set of changes. 🤣
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My own personal opinion is that ranges are a bit more natural than an offset/length pair. I'm not sure what motivated clang-format's --offset
and --length
flags unless it was some very specific workflow that already had the information in that form, but I don't think ordered pairs of --offset X --length Y ...
would work well in swift-argument-parser or for human users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, I would also prefer --lines <start line>:<end line>
. And if we do the conversion up front with a SourceLocationConverter
, I don’t think it should be too expensive. SourceLocationConverter
needs to build up a line table but I would be surprised if that’s the same order of magnitude as formatting (would be nice to confirm though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SourceLocationConverter
is already being created in other code paths anyway, so using it to map the line numbers to offsets or vice versa shouldn't create additional overhead; we just need to make sure we're using the same one and not creating a separate one just for that purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like --lines <start line>:<end line>
as well. For an actual user, this is way more useful than offsets. But offsets works well when integrating with other tools. (It lets you, for instance, reformat spacing within a single line without affecting anything else. That's something you might want to do from within an editor.) Chatting with Tony on Friday we also agreed that using --offsets <start offset>:<end offset>
is cleaner. I'll make that change in this PR. Supporting --lines <start line>:<end line>
can come in a followup PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may add the --lines <start line>:<end line>
support in this PR... at least for the testing support. Right now I've disabled the idempotency testing for non-infinite selection tests. If we support specifying lines, then we can (usually) check for idempotency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaveEwing Alex pointed me to this conversion because I'm trying to implement git swift-format
, we do need --lines
for that to work because that's the only easily extractable information from git diff-index
.
I've got testing support plumbed through, but I fixed the whitespace issues I was seeing. |
A note on performance. To be honest, I was rather worried about the performance of this approach. So, I did a couple of tests on a release build of The tests are:
Here are the results:
In all cases, formatting finished in well under a second in real time. (I didn't precisely meassure this.) Formatting the whole file without a selection is about 2% slower. Formatting the whole file with a selection is less than 4% slower. Formatting a small range is significantly faster. That shouldn't really be a surprise, but it is nice to confirm! |
/// If the marked text contains "➡️" and "⬅️", they're used to create a selection | ||
public var selection: Selection? | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all the markers are represented as Int
that signify UTF-8 offsets, I think it would make sense to also store the ranges as Range<Int>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now using Range<Int>
as they're built up, but create a Selection
in the initializer. I may want to revisit this (and not have a selection
property) if we support line ranges.
Pushed changes that respond to most of the comments from @allevato and @ahoppen. (I need to take another pass through the comments to see what I've missed!) I think the changes really are improvements, so thanks you two! I also need to do more testing locally... probably adding more tests - and getting the one disabled test working. |
4558882
to
a1b10de
Compare
The basic idea here is to insert `enableFormatting` and `disableFormatting` tokens into the print stream when we enter or leave the selection. When formatting is enabled, we print out the tokens as usual. When formatting is disabled, we turn off any output until the next `enableFormatting` token. When that token is hit, we write the original source text from the location of the last `disableFormatting` to the current location. Note that this means that all the APIs need the original source text to be passed in. A `Selection` is represented as an enum with an `.infinite` case, and a `.ranges` case to indicate either selecting the entire file, or an array of start/end utf-8 offsets. The offset pairs are given with `Range<AbsolutePosition>`, matching the (now common) usage in swift-syntax. For testing, allow marked text to use `⏩` and `⏪` to deliniate the start/end of a range of a selection. The command line now takes an `--offsets` option of comma-separated "start:end" pairs to set the selection for formatting.
a1b10de
to
6640067
Compare
if let nonWhitespace = text.rangeOfCharacter( | ||
from: CharacterSet.whitespaces.inverted, options: .backwards) { | ||
text = String(text[..<nonWhitespace.upperBound]) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m still missing how the whitespace characters that we’re shaving off with this are being printed.
If you have the following
func foo() ⏩{}⏪
Then I think this only adds func foo()
to the output stream, right (no whitespace)? But the next token that we print is the {
, which doesn’t have any leading trivia (spaces are trailing trivia to )
).
If those examples work because of some trivia re-attribution rule, how about the following?
func foo() /**/ ⏩{}⏪
func foo() /**/
⏩{}⏪
…ore test cases. For formatting a selection (<#297>).
… support passing multiple of them.
7e7d17c
to
0c0977d
Compare
Looking through my comments again, I think there are 3 open comments (noting them here as much for my own sake as everybody else’s)
|
github won't let me comment on I’m still missing how the whitespace characters that we’re shaving off with this are being printed, so I'll do it here. As I alluded to here, we strip off trailing whitespace when we copy the original code so that we can let the normal mechanisms do all the whitespace generation magic instead. This is actually pretty simple. It's a matter of setting just a couple of state variables. This is PrettyPrint.swift, lines 610-617. |
And one additional comment. Currently all the rules that do "non-whitespace" formatting are disabled when there's a selection. For instance, formatting won't remove semicolons when there's a selection. I think that's fine for a first pass, but I want to get that working too. My intention, and what the code is currently trying to do but is failing, was to only enable processing of nodes that are fully-contained inside individual ranges. (By "failing", I mean that it just ends up disabling all rules.) Instead, I'm now looking at enabling the rules all the time, and using the existing mechanism to turn printing on/off. That's not working yet because the source ranges of the rewritten nodes no longer match the original source. I've good some work to do to figure out how to get it right.... |
OK, I think I understand what was throwing me off now I thought that the space and tab trivia from the syntax tree were also added to the token stream but they are not. Also, you assume that
I think we have two solutions here:
|
Once again, github doesn't let me comment directly. Weird. In any case, adding a non-breaking space to your source causes swift-format to fail completely. It just outputs an empty file. This is true even before the changes in this PR. I haven't bothered looking into what it's doing. I've tried cases where a comment ends with a space, or when block comments have a trailing space. I haven't found any differences between the behavior with or without a selection. The spaces are always removed. (And the behavior matches the behavior before the changes in this PR.) While there might be some edge cases here, they don't seem important to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree I am very pedantic here. A lot of it was me trying to understand how things fit together to see if there are any more significant issues lurking somewhere. But seems like there aren’t. Let’s get this merged and we can always fix it later if it’s causing any issues in practice (which I agree would probably be very niece anyway).
@DaveEwing Were you still working on this, or did you want to land this change and then come back to it? I'm fine either way (since it's not immediately obvious how deep the change will be make that work). I'll go ahead and kick off CI again, and if it comes back clean and you're happy with the current state of the PR, we can get it merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is green, so let me know if you'd like this merged now!
Part of me would like to clean this up first (maybe not fix it, but just make it more obvious in the code, and to users, that the rules are disabled). But I'd gladly do that in a second PR. If you're up for it @allevato, let's merge, and I'll continue with polish over the next few weeks/months. |
I'll go ahead and merge then. Since this feature is only on |
Support for formatting a selection
Work in progress at the moment. I have a handful of tweaks I want to make, but it appears to work pretty well overall.