-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Remove duplicate impl of string unescape from parse_format #137995
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead. cc @rust-lang/rust-analyzer |
This is a large (+447/−547) change with zero explanation. I need some context and motivation, please! Also, from skimming it I think there might be multiple distinct changes in the single commit? If so, it would be easier to review if they were separate. |
Ah, sorry about that. Let me try and fix that: The idea for this comes from this code at the bottom of rustc_parse_format/lib.rs:
which does unescaping but throws away all span information from the original string (via the
which then does its own light version of string unescape to build a width map (if the unescaped string matches the input string), which is basically a list of expansions from the unescaped string back to the original string, which has to be traversed to determine the old position (this happens in The new code does the unescaping in
Here we're feeling some pain from the callback-based nature of unescaping, which forces the collection of span info into a Vec (at least I could not see a good alternative). Basically, the Parser needs to know the position of each character in Most of the other changes are because of this change of span info representation.
It is possible, but this code went through a lot of iterations, as I came to understand the exact constraints imposed by the ui tests, and this is basically the first version that passes all those ui tests. There are a few minor things that come to mind:
I'm hoping this is enough to get the broad idea, such that you can start asking about specific bits of this change, but let me know if there is more you need or that I can do to clarify. |
Given the changes in work done and the probable increase in memory use: |
This comment has been minimized.
This comment has been minimized.
@hkBst: 🔑 Insufficient privileges: not in try users |
This comment has been minimized.
This comment has been minimized.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
…=<try> Remove duplicate impl of string unescape from parse_format r? `@nnethercote`
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (fe03ab0): comparison URL. Overall result: no relevant changes - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 777.245s -> 777.886s (0.08%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow review. This looks good. A few nits to address. I didn't follow every single detail, but it looks like a clear simplification. The removal of InnerSpan
, InputStringKind
and InnerWidthMapping
in particular are good.
You inlined a few functions, it would have been good to do them in separate commits. Also I wonder if the InnerSpan
-to-Range
change could have been done in its own commit, before the other changes. (Just thinking out loud; it's always a good idea to split up changes into multiple commits where possible, to make life easier for the reviewer.)
}; | ||
let Some(argument_binding) = ty.kind.is_simple_path() else { | ||
continue; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is unnecessary, AFAICT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, change removed.
@@ -90,24 +44,24 @@ pub enum Piece<'a> { | |||
} | |||
|
|||
/// Representation of an argument specification. | |||
#[derive(Copy, Clone, Debug, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use #![feature(new_range_api)]
you can use the new experimental core::range::Range
type, which implements Copy
. That would avoid some clone
calls you've had to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using this crate in rust-analyzer, so we'd appreciate if it kept building on stable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given, @lnicola's objection I'll leave this for now.
havewidth = true; | ||
} else { | ||
spec.zero_pad = true; | ||
} | ||
} | ||
|
||
if !havewidth { | ||
let start = self.current_pos(); | ||
spec.width = self.count(start); | ||
let start_ix = self.index; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
start_idx
or start_index
would be more idiomatic for this code base.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
@@ -234,91 +188,90 @@ pub enum Suggestion { | |||
pub struct Parser<'a> { | |||
mode: ParseMode, | |||
input: &'a str, | |||
cur: std::iter::Peekable<std::str::CharIndices<'a>>, | |||
input_vec: Vec<(Range<usize>, usize, char)>, | |||
index: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could have a better name, one that indicates what it indexes into. Is it input_vec
? If so, input_vec_index
would be appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on these new fields (and maybe input
) would also be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name changed and comments added.
@rustbot author |
No problem. Glad we're in agreement here.
Thanks for the advice. It is good to get your perspective. I'll try to be more mindful of the reviewer's job. Thanks for reviewing! |
94fb87a
to
4711153
Compare
@rustbot ready |
@bors r+ |
…=nnethercote Remove duplicate impl of string unescape from parse_format r? `@nnethercote`
Sorry, perf and CI LLVM is a bit broken atm. Please re-approve once bootstrap & perf is fixed. |
@rustbot ready |
☔ The latest upstream changes (presumably #139452) made this pull request unmergeable. Please resolve the merge conflicts. |
r? @nnethercote