Skip to content

Allow training PGO on a custom crate and enable it Windows on CI #19585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 15, 2025

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Apr 14, 2025

This PR adds functionality to train PGO on a custom crate downloaded from GitHub (e.g. --pgo clap-rs/clap), and enables PGO for Windows on CI.

I added the functionality because training on itself produced OOMs on Windows CI.

I didn't enable PGO for aarch64-pc-windows-msvc, because there it's more complicated because of the cross-compilation.

If someone wants to test the performance effect, they could compare the following builds:

Download and extract the dist-x86_64-pc-windows-msvc artifact, and then run rust-analyzer.exe analysis-stats --run-all-ide-things <rust-analyzer-checkout-directory>.

Related issue: #9412

@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 14, 2025
@Kobzol Kobzol force-pushed the pgo-windows branch 4 times, most recently from c2c5ac1 to 5a14e19 Compare April 14, 2025 17:30
@Kobzol Kobzol changed the title [WIP] Test PGO for Windows on CI Allow training PGO on a custom crate and enable it Windows on CI Apr 15, 2025
@Kobzol
Copy link
Contributor Author

Kobzol commented Apr 15, 2025

Tested that now with clap PGO works for x64 and i686 Windows.

r? @Veykril

Copy link
Member

@lnicola lnicola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if training on the standard library wouldn't be sufficient, but this looks reasonable too.

@Kobzol
Copy link
Contributor Author

Kobzol commented Apr 15, 2025

That would require looking up the sources, which is IMO similar complexity to downloading a crate from github. Also "normal" crates might be a teeny tiny bit more representative than stdlib, although here it probably doesn't matter that much.

@lnicola
Copy link
Member

lnicola commented Apr 15, 2025

They shouldn't be too hard to find, we do rust-analyzer analysis-stats --with-deps --no-sysroot --no-test $(rustc --print sysroot)/lib/rustlib/src/rust/library/ -q in another workflow.

I don't have any opinion on which is more representative, though.

@zamazan4ik
Copy link
Contributor

zamazan4ik commented Apr 15, 2025

I've made measurements on my Windows machine.

Environment:

For benchmark purposes, I used rust-analyzer.exe analysis-stats --run-all-ide-things <rust-analyzer-checkout-directory> command. Measurements are done multiple times, in different orders, etc. First runs are skipped due to "Antimalware Service Executable" thing (I was too lazy to add the binaries to the exceptions). The results are stable across runs.

Results (the Total number at the end of the execution. Providing the full log will be too lengthy):

  • No PGO: 200s
  • PGO: 170s

Nice improvement!

@Kobzol
Copy link
Contributor Author

Kobzol commented Apr 15, 2025

@lnicola Are you fine with running on a custom crate? It seems like it takes ~20 minutes, but since the release workflow runs once per day, that shouldn't really be an issue IMO.

@lnicola
Copy link
Member

lnicola commented Apr 15, 2025

Yeah, just me give an hour or so, I want to test it, just in case.

@lnicola
Copy link
Member

lnicola commented Apr 15, 2025

Okay, I gave this a try, these are the best-of-three times on self (don't read too much into the decimals), without the IDE things:

  • PGO on rust-analyzer: 72.78 s
  • PGO on clap: 71.74 s
  • PGO on rustlib: 72.54 s
  • no PGO: 93.09 s

Given the xtask dist time difference (386 s vs. 182 s), I'd be fine with training only on clap on all platforms, what do you think? We could also throw in --parallel, but I don't feel like testing that 😅.

For future reference:

diff --git i/xtask/src/dist.rs w/xtask/src/dist.rs
index 7e26167725..7df0ffac42 100644
--- i/xtask/src/dist.rs
+++ w/xtask/src/dist.rs
@@ -169,7 +169,7 @@ fn gather_pgo_profile<'a>(
     // Figure out a path to `llvm-profdata`
     let target_libdir = cmd!(sh, "rustc --print=target-libdir")
         .read()
-        .context("cannot resolve target-libdir from rustc")?;
+        .context("cannot resolve rustc target-libdir")?;
     let target_bindir = PathBuf::from(target_libdir).parent().unwrap().join("bin");
     let llvm_profdata = target_bindir.join("llvm-profdata").with_extension(EXE_EXTENSION);
 
@@ -178,10 +178,20 @@ fn gather_pgo_profile<'a>(
         ra_build_cmd.env("RUSTFLAGS", format!("-Cprofile-generate={}", pgo_dir.to_str().unwrap()));
     cmd_gather.run().context("cannot build rust-analyzer with PGO instrumentation")?;
 
-    let (train_path, label) = match &train_crate {
-        PgoTrainingCrate::RustAnalyzer => (PathBuf::from("."), "itself"),
+    let (train_path, label, extra_args) = match &train_crate {
+        PgoTrainingCrate::RustAnalyzer => (PathBuf::from("."), "itself", Vec::new()),
+        PgoTrainingCrate::RustLib => {
+            let sysroot = cmd!(sh, "rustc +nightly --print=sysroot")
+                .read()
+                .context("cannot resolve rustc sysroot")?;
+            (
+                PathBuf::from(format!("{sysroot}/lib/rustlib/src/rust/library/")),
+                "the Rust standard library",
+                vec!["--with-deps", "--no-sysroot", "--no-test"],
+            )
+        }
         PgoTrainingCrate::GitHub(url) => {
-            (download_crate_for_training(sh, &pgo_dir, url)?, url.as_str())
+            (download_crate_for_training(sh, &pgo_dir, url)?, url.as_str(), Vec::new())
         }
     };
 
@@ -189,7 +199,7 @@ fn gather_pgo_profile<'a>(
     eprintln!("Training RA on {label}...");
     cmd!(
         sh,
-        "target/{target}/release/rust-analyzer analysis-stats -q --run-all-ide-things {train_path}"
+        "target/{target}/release/rust-analyzer analysis-stats -q --run-all-ide-things {extra_args...} {train_path}"
     )
     .run()
     .context("cannot generate PGO profiles")?;
diff --git i/xtask/src/flags.rs w/xtask/src/flags.rs
index 700806d178..9607e93e15 100644
--- i/xtask/src/flags.rs
+++ w/xtask/src/flags.rs
@@ -145,8 +145,10 @@ pub struct RustcPush {
 
 #[derive(Debug)]
 pub enum PgoTrainingCrate {
-    // Use RA's own sources for PGO training
+    // Use RA's own sources for PGO training.
     RustAnalyzer,
+    // Use the Rust standard library for PGO training.
+    RustLib,
     // Download a Rust crate from `https://github.com/{0}` and use it for PGO training.
     GitHub(String),
 }
@@ -157,6 +159,7 @@ impl FromStr for PgoTrainingCrate {
     fn from_str(s: &str) -> Result<Self, Self::Err> {
         match s {
             "rust-analyzer" => Ok(Self::RustAnalyzer),
+            "rustlib" => Ok(Self::RustLib),
             url => Ok(Self::GitHub(url.to_owned())),
         }
     }

@Kobzol
Copy link
Contributor Author

Kobzol commented Apr 15, 2025

Given the xtask dist time difference (386 s vs. 182 s)

Could you please post these timings, to get a better estimation of how long did xtask run? Or did you take this from CI?

@lnicola
Copy link
Member

lnicola commented Apr 15, 2025

xtask dist --pgo clap-rs/clap took 182 s on my computer, while xtask dist --pgo rust-analyzer took 386 s.

@lnicola lnicola added this pull request to the merge queue Apr 15, 2025
@Kobzol
Copy link
Contributor Author

Kobzol commented Apr 15, 2025

Oh, I see. Ok, I will use clap in a follow-up PR where I will enable PGO also for Linux then.

Merged via the queue into rust-lang:master with commit f766d14 Apr 15, 2025
14 checks passed
@Kobzol Kobzol deleted the pgo-windows branch April 15, 2025 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants