Skip to content

Sanity check profiler atomics #113448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -740,14 +740,18 @@ LLVMRustOptimize(

if (InstrumentCoverage) {
PipelineStartEPCallbacks.push_back(
[InstrProfileOutput](ModulePassManager &MPM, OptimizationLevel Level) {
[InstrProfileOutput, TargetTriple](ModulePassManager &MPM, OptimizationLevel Level) {
InstrProfOptions Options;
if (InstrProfileOutput) {
Options.InstrProfileOutput = InstrProfileOutput;
}
// cargo run tests in multhreading mode by default
// so use atomics for coverage counters
Options.Atomic = true;
// This only works on platforms that support 64 bit atomic operations
// So, don't do it on 32 bit platforms
if (TargetTriple.isArch64Bit()){
Options.Atomic = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work on i686-unknown-linux-gnu, at least, so this must be more nuanced than just being 64-bit.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and no. 32bit architectures can’t do them in a single operation because their registers aren’t large enough for the 64 bit value used by the counter.

Some 32 bit (x86 and arm sub targets that support sync) have library implementations for the operation.

But, those library implementations are slow. This means a significant slowdown and change of timing for the program.

So, it’s a trade for either better accuracy in counts at the cost of a slow program with timing that is influenced by the profiler, or a program that runs more like normal and works on all 32 bit platforms, at the cost of potential undercounts in the profiler. It can’t undercount to 0 though.

The primary use here, code coverage measurement, is unaffected by the potential inaccuracy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we introducing UB if we instrument a data race by non-atomic updates?

That's not a rhetorical question - I really don't know. Maybe it's done low enough in the stack that such formal UB doesn't exist, but I think we should be very sure about it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UB = undefined behavior?

The variables being incremented are distinct per function, so the race would be between threads calling the same function. The worst case is that they both read and increment, then the thread changes and they end up overwriting each others increment.

Net result is an undercount by 1 anytime there is a collision.

LLVM’s profiler ran this way exclusively for more a decade. The option for Atomics was added but is still defaulted to off.

We could also check for x86 and arm and turn it on for them. I don’t really have a stake either way.

My personal opinion is that I’d rather trade a little accuracy in counts for more normal performance. Makes analysis of races and other timing critical things more useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have target information about this support at the Rust level, cfg(target_has_atomic="64") and max_atomic_width(), so maybe we can just pass that in as yet-another parameter? Either making that bool InstrumentCoverage a tri-state flag, or adding another bool for whether to use atomics.

(LLVMRustOptimize is getting so many parameters that we might want a new struct...)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have target information about this support at the Rust level, cfg(target_has_atomic="64") and max_atomic_width(), so maybe we can just pass that in as yet-another parameter? Either making that bool InstrumentCoverage a tri-state flag, or adding another bool for whether to use atomics.

(LLVMRustOptimize is getting so many parameters that we might want a new struct...)

IMHO, this is absolutely the right answer. I'm not sure that I'm well versed enough in all of this to do that, but I'd be willing to give it a shot if I could impose on somebody for a "block diagram" of where it would need to be wired in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of LLVMRustOptimize here needs to be changed to make the InstrumentCoverage parameter correct to the new type introduced, or to add a new parameter.

extern "C" LLVMRustResult
LLVMRustOptimize(
LLVMModuleRef ModuleRef,
LLVMTargetMachineRef TMRef,
LLVMRustPassBuilderOptLevel OptLevelRust,
LLVMRustOptStage OptStage,
bool NoPrepopulatePasses, bool VerifyIR, bool UseThinLTOBuffers,
bool MergeFunctions, bool UnrollLoops, bool SLPVectorize, bool LoopVectorize,
bool DisableSimplifyLibCalls, bool EmitLifetimeMarkers,
LLVMRustSanitizerOptions *SanitizerOptions,
const char *PGOGenPath, const char *PGOUsePath,
bool InstrumentCoverage, const char *InstrProfileOutput,

Then the binding in cg_llvm to that function needs to be changed to match:

pub fn LLVMRustOptimize<'a>(

And the callsite of LLVMRustOptimize need to be altered appropriately:

let result = llvm::LLVMRustOptimize(

You may have trouble getting the "does our target have atomics?" information at this point, you may need to work back a bit, possibly by making sure the information is accessible via the "god-object" of CodegenContext:

pub(crate) unsafe fn llvm_optimize(
cgcx: &CodegenContext<LlvmCodegenBackend>,

Which is documented here:
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/back/write/struct.CodegenContext.html

And parameterized by this:
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/struct.LlvmCodegenBackend.html

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@workingjubilee Thanks!

MPM.addPass(InstrProfiling(Options, false));
}
);
Expand Down