Skip to content

[fatlto] Add coroutine passes when using FatLTO with ThinLTO #134434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 43 additions & 0 deletions clang/test/CodeGenCoroutines/pr134409.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
// An end-to-end test to make sure coroutine passes are added for thinlto.
// REQUIRES: x86-registered-target
// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -std=c++23 -ffat-lto-objects -flto=thin -emit-llvm %s -O3 -o - \
// RUN: | FileCheck %s

#include "Inputs/coroutine.h"

class BasicCoroutine {
public:
struct Promise {
BasicCoroutine get_return_object() { return BasicCoroutine {}; }

void unhandled_exception() noexcept { }

void return_void() noexcept { }

std::suspend_never initial_suspend() noexcept { return {}; }
std::suspend_never final_suspend() noexcept { return {}; }
};
using promise_type = Promise;
};

// COM: match the embedded module, so we don't match something in it by accident.
// CHECK: @llvm.embedded.object = {{.*}}
// CHECK: @llvm.compiler.used = {{.*}}

BasicCoroutine coro() {
// CHECK: define {{.*}} void @_Z4corov() {{.*}} {
// CHECK-NEXT: entry:
// CHECK-NEXT: ret void
// CHECK-NEXT: }
co_return;
}

int main() {
// CHECK: define {{.*}} i32 @main() {{.*}} {
// CHECK-NEXT: entry:
// CHECK-NEXT: tail call void @_Z4corov()
// CHECK-NEXT: ret i32 0
// CHECK-NEXT: }
coro();
}

13 changes: 13 additions & 0 deletions llvm/lib/Passes/PassBuilderPipelines.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1692,6 +1692,19 @@ PassBuilder::buildFatLTODefaultPipeline(OptimizationLevel Level, bool ThinLTO,
if (ThinLTO && PGOOpt && PGOOpt->Action == PGOOptions::SampleUse)
MPM.addPass(buildThinLTODefaultPipeline(Level, /*ImportSummary=*/nullptr));
else {
// ModuleSimplification does not run the coroutine passes for
// ThinLTOPreLink, so we need the coroutine passes to run for ThinLTO
// builds, otherwise they will miscompile.
if (ThinLTO) {
// TODO: replace w/ buildCoroWrapper() when it takes phase and level into
// consideration.
CGSCCPassManager CGPM;
CGPM.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
CGPM.addPass(CoroAnnotationElidePass());
MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(std::move(CGPM)));
MPM.addPass(CoroCleanupPass());
Comment on lines +1699 to +1705
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also possible to do MPM.addPass(buildModuleSimplificationPipeline(Level, ThinOrFullLTOPhase::ThinLTOPostLink));

That seems to generate better code here, since it allows the tail call in the test to get removed. But that seems a bit expensive, just to give the inliner a second chance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current PR is a good way to start, as it avoids the miscompilation in a minimal way that is suitable for backporting.

Using the module simplification pipeline here (+ the module optimization pipeline) would effectively be the same as using the ThinLTO post-link pipeline.

I ran some tests, and it looks like adding -ffat-lto-objects adds about 10% overhead to the clang build (https://llvm-compile-time-tracker.com/compare.php?from=44923d8631fb28b4de54d4210762f256c3894cef&to=2bdf721c2d37af6fcbd931d963f586478cb55f17&stat=instructions:u). On top of that, switching this from using the module optimization pipeline to the ThinLTO post-link pipeline (your original PR) adds an extra 4%: https://llvm-compile-time-tracker.com/compare.php?from=2bdf721c2d37af6fcbd931d963f586478cb55f17&to=63666418fbe19f30bf971796747a751b4e1c57f3&stat=instructions:u

Adding 4% overhead is not great, but also not terrible, so maybe it's worth it to avoid pipeline compatibility issues in a principled way. It does make the codegen for FatLTO ELF diverge more from a normal compilation though.

Ideally we'd be able to rerun the simplification pipeline, but skip the inliner pipeline for already optimized functions. (I think @aeubanks had a prototype that did that for the actual ThinLTO scenario, by looking at available_externally functions. The situation here is somewhat different.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, 10% seems a bit high for overhead on build times, though we haven't used it too much w/ ThinLTO in our toolchain, so maybe that's it?

Looking at our build times when we enabled it in our toolchain, we saw about a 2.5% slowdown in total build time, but a 22% improvement in test time (ninja check-*). Overall that ended up being about 4.4% speedup in total time.

So, I'm not surprised it slowed down for just the build, but I am surprised it added a full 10%. Well, I guess I/O can have a lot of variance between machines, so maybe that's enough to explain it, since for ThinLTO it probably more than doubles the size of the .o.

}

// otherwise, just use module optimization
MPM.addPass(
buildModuleOptimizationPipeline(Level, ThinOrFullLTOPhase::None));
Expand Down