Remove duplicate impl of string unescape from parse_format #137995

hkBst · 2025-03-04T12:20:26Z

rustbot · 2025-03-04T12:20:31Z

rust-analyzer is developed in its own repository. If possible, consider making this change to rust-lang/rust-analyzer instead.

cc @rust-lang/rust-analyzer

nnethercote · 2025-03-04T23:55:38Z

This is a large (+447/−547) change with zero explanation. I need some context and motivation, please! Also, from skimming it I think there might be multiple distinct changes in the single commit? If so, it would be easier to review if they were separate.

hkBst · 2025-03-05T09:26:19Z

This is a large (+447/−547) change with zero explanation. I need some context and motivation, please!

Ah, sorry about that. Let me try and fix that:

The idea for this comes from this code at the bottom of rustc_parse_format/lib.rs:

fn unescape_string(string: &str) -> Option<String> {
    let mut buf = String::new();
    let mut ok = true;
    unescape::unescape_unicode(string, unescape::Mode::Str, &mut |_, unescaped_char| {
        match unescaped_char {
            Ok(c) => buf.push(c),
            Err(_) => ok = false,
        }
    });

    ok.then_some(buf)
}

which does unescaping but throws away all span information from the original string (via the _ in &mut |_, unescaped_char|. This function is called in fn find_width_map_from_snippet:

    let Some(unescaped) = unescape_string(snippet) else {
        return InputStringKind::NotALiteral;
    };

which then does its own light version of string unescape to build a width map (if the unescaped string matches the input string), which is basically a list of expansions from the unescaped string back to the original string, which has to be traversed to determine the old position (this happens in fn remap_pos, fn to_span_index, fn to_span_width, and fn span). (Doing a Vec traversal for each original position that is needed is quadratic behavior as it is linear in both the length of the width map Vec and the number of such translations from new to old position.)

The new code does the unescaping in Parser::new, while collecting the position information into a Vec, and checking that the unescaped string matches the input string like so:

                // snippet is not a raw string
                if snippet.starts_with('"') {
                    // snippet looks like an ordinary string literal
                    // check whether it is the escaped version of input
                    let without_quotes = &snippet[1..snippet.len() - 1];
                    let (mut ok, mut vec) = (true, vec![]);
                    let mut chars = input.chars();
                    unescape::unescape_unicode(
                        without_quotes,
                        unescape::Mode::Str,
                        &mut |range, res| match res {
                            Ok(ch) if ok && chars.next().is_some_and(|c| ch == c) => {
                                vec.push((range, ch));
                            }
                            _ => {
                                ok = false;
                                vec = vec![];
                            }
                        },
                    );

Here we're feeling some pain from the callback-based nature of unescaping, which forces the collection of span info into a Vec (at least I could not see a good alternative). Basically, the Parser needs to know the position of each character in input (Peekable<Char<>> in the current code) and in the original string as typed (snippet) (width_map plus translation functions in the current code). This new code ultimately collects a Vec<(original span, char byte pos in input, char in input)> for the same purpose. It is thus probably using more memory.

Most of the other changes are because of this change of span info representation.

Also, from skimming it I think there might be multiple distinct changes in the single commit?

It is possible, but this code went through a lot of iterations, as I came to understand the exact constraints imposed by the ui tests, and this is basically the first version that passes all those ui tests.

There are a few minor things that come to mind:

I ended up inlining fn self.suggest_format_parameter into its single use, since most of the code was just duplicate work. I'm not sure if that is also possible with the old way of handling the span info.
I also ended up inlining err and err_with_note, one of which had a single use, and the other two or three uses, since they did not seem to be carrying their weight.

I'm hoping this is enough to get the broad idea, such that you can start asking about specific bits of this change, but let me know if there is more you need or that I can do to clarify.

hkBst · 2025-03-05T09:28:18Z

Given the changes in work done and the probable increase in memory use:
@bors try @rust-timer queue

bors · 2025-03-05T09:28:21Z

@hkBst: 🔑 Insufficient privileges: not in try users

nnethercote · 2025-03-05T12:10:57Z

@bors try @rust-timer queue

bors · 2025-03-05T12:12:12Z

⌛ Trying commit 94fb87a with merge fe03ab0...

…=<try> Remove duplicate impl of string unescape from parse_format r? `@nnethercote`

bors · 2025-03-05T14:11:34Z

☀️ Try build successful - checks-actions
Build commit: fe03ab0 (fe03ab008f4474bd7092d967e9eb28cfe09d0664)

rust-timer · 2025-03-05T15:27:54Z

Finished benchmarking commit (fe03ab0): comparison URL.

Overall result: no relevant changes - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 777.245s -> 777.886s (0.08%)
Artifact size: 362.11 MiB -> 362.15 MiB (0.01%)

nnethercote

Sorry for the slow review. This looks good. A few nits to address. I didn't follow every single detail, but it looks like a clear simplification. The removal of InnerSpan, InputStringKind and InnerWidthMapping in particular are good.

You inlined a few functions, it would have been good to do them in separate commits. Also I wonder if the InnerSpan-to-Range change could have been done in its own commit, before the other changes. (Just thinking out loud; it's always a good idea to split up changes into multiple commits where possible, to make life easier for the reviewer.)

nnethercote · 2025-03-20T03:35:41Z

compiler/rustc_builtin_macros/src/format.rs

+        };
+        let Some(argument_binding) = ty.kind.is_simple_path() else {
+            continue;
+        };


This change is unnecessary, AFAICT.

Indeed, change removed.

nnethercote · 2025-03-20T03:48:04Z

compiler/rustc_parse_format/src/lib.rs

@@ -90,24 +44,24 @@ pub enum Piece<'a> {
 }

 /// Representation of an argument specification.
-#[derive(Copy, Clone, Debug, PartialEq)]


If you use #![feature(new_range_api)] you can use the new experimental core::range::Range type, which implements Copy. That would avoid some clone calls you've had to add.

We are using this crate in rust-analyzer, so we'd appreciate if it kept building on stable.

Given, @lnicola's objection I'll leave this for now.

nnethercote · 2025-03-20T03:56:11Z

compiler/rustc_parse_format/src/lib.rs

                havewidth = true;
            } else {
                spec.zero_pad = true;
            }
        }

        if !havewidth {
-            let start = self.current_pos();
-            spec.width = self.count(start);
+            let start_ix = self.index;


start_idx or start_index would be more idiomatic for this code base.

nnethercote · 2025-03-20T03:58:49Z

compiler/rustc_parse_format/src/lib.rs

@@ -234,91 +188,90 @@ pub enum Suggestion {
 pub struct Parser<'a> {
    mode: ParseMode,
    input: &'a str,
-    cur: std::iter::Peekable<std::str::CharIndices<'a>>,
+    input_vec: Vec<(Range<usize>, usize, char)>,
+    index: usize,


This could have a better name, one that indicates what it indexes into. Is it input_vec? If so, input_vec_index would be appropriate.

Comments on these new fields (and maybe input) would also be helpful.

Name changed and comments added.

nnethercote · 2025-03-20T04:03:02Z

@rustbot author

hkBst · 2025-03-21T13:51:22Z

Sorry for the slow review. This looks good. A few nits to address. I didn't follow every single detail, but it looks like a clear simplification. The removal of InnerSpan, InputStringKind and InnerWidthMapping in particular are good.

No problem. Glad we're in agreement here.

You inlined a few functions, it would have been good to do them in separate commits. Also I wonder if the InnerSpan-to-Range change could have been done in its own commit, before the other changes. (Just thinking out loud; it's always a good idea to split up changes into multiple commits where possible, to make life easier for the reviewer.)

Thanks for the advice. It is good to get your perspective. I'll try to be more mindful of the reviewer's job. Thanks for reviewing!

hkBst · 2025-03-21T15:02:10Z

@rustbot ready

nnethercote · 2025-03-24T03:42:49Z

@bors r+

bors · 2025-03-24T03:42:52Z

📌 Commit 4711153 has been approved by nnethercote

It is now in the queue for this repository.

…=nnethercote Remove duplicate impl of string unescape from parse_format r? `@nnethercote`

bors · 2025-03-24T04:13:42Z

⌛ Testing commit 4711153 with merge 4652ee6...

jieyouxu · 2025-03-24T06:04:54Z

Sorry, perf and CI LLVM is a bit broken atm. Please re-approve once bootstrap & perf is fixed.
@bors retry r-

hkBst · 2025-04-04T11:09:37Z

@rustbot ready

bors · 2025-04-06T20:45:01Z

☔ The latest upstream changes (presumably #139452) made this pull request unmergeable. Please resolve the merge conflicts.

rustbot assigned nnethercote Mar 4, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 4, 2025

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 5, 2025

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 5, 2025

Auto merge of rust-lang#137995 - hkBst:parse_format_reuse_unescape, r…

fe03ab0

…=<try> Remove duplicate impl of string unescape from parse_format r? `@nnethercote`

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Mar 5, 2025

nnethercote reviewed Mar 20, 2025

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 20, 2025

Remove duplicate impl of string unescape

4711153

rust-cloud-vms bot force-pushed the parse_format_reuse_unescape branch from 94fb87a to 4711153 Compare March 21, 2025 14:57

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 21, 2025

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 24, 2025

bors added a commit to rust-lang-ci/rust that referenced this pull request Mar 24, 2025

Auto merge of rust-lang#137995 - hkBst:parse_format_reuse_unescape, r…

4652ee6

…=nnethercote Remove duplicate impl of string unescape from parse_format r? `@nnethercote`

bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Mar 24, 2025

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove duplicate impl of string unescape from parse_format #137995

Remove duplicate impl of string unescape from parse_format #137995

hkBst commented Mar 4, 2025

rustbot commented Mar 4, 2025

nnethercote commented Mar 4, 2025

hkBst commented Mar 5, 2025

hkBst commented Mar 5, 2025 •

edited

Loading

This comment has been minimized.

bors commented Mar 5, 2025

This comment has been minimized.

nnethercote commented Mar 5, 2025

This comment has been minimized.

bors commented Mar 5, 2025

bors commented Mar 5, 2025

This comment has been minimized.

rust-timer commented Mar 5, 2025

nnethercote left a comment

nnethercote Mar 20, 2025

hkBst Mar 21, 2025

nnethercote Mar 20, 2025

lnicola Mar 20, 2025

hkBst Mar 21, 2025

nnethercote Mar 20, 2025

hkBst Mar 21, 2025

nnethercote Mar 20, 2025

nnethercote Mar 20, 2025

hkBst Mar 21, 2025

nnethercote commented Mar 20, 2025

hkBst commented Mar 21, 2025

hkBst commented Mar 21, 2025

nnethercote commented Mar 24, 2025

bors commented Mar 24, 2025

bors commented Mar 24, 2025

jieyouxu commented Mar 24, 2025

hkBst commented Apr 4, 2025

bors commented Apr 6, 2025

Remove duplicate impl of string unescape from parse_format #137995

Are you sure you want to change the base?

Remove duplicate impl of string unescape from parse_format #137995

Conversation

hkBst commented Mar 4, 2025

rustbot commented Mar 4, 2025

nnethercote commented Mar 4, 2025

hkBst commented Mar 5, 2025

hkBst commented Mar 5, 2025 • edited Loading

This comment has been minimized.

bors commented Mar 5, 2025

This comment has been minimized.

nnethercote commented Mar 5, 2025

This comment has been minimized.

bors commented Mar 5, 2025

bors commented Mar 5, 2025

This comment has been minimized.

rust-timer commented Mar 5, 2025

Overall result: no relevant changes - no action needed

nnethercote left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nnethercote commented Mar 20, 2025

hkBst commented Mar 21, 2025

hkBst commented Mar 21, 2025

nnethercote commented Mar 24, 2025

bors commented Mar 24, 2025

bors commented Mar 24, 2025

jieyouxu commented Mar 24, 2025

hkBst commented Apr 4, 2025

bors commented Apr 6, 2025

hkBst commented Mar 5, 2025 •

edited

Loading