Skip to content

Crash while trying to instrument a binary in AArch64 (Assertion `MO.isImm() && "did not expect relocated expression"' #128282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jcgomezv opened this issue Feb 22, 2025 · 73 comments
Labels
BOLT crash Prefer [crash-on-valid] or [crash-on-invalid]

Comments

@jcgomezv
Copy link

BOLT-INFO: first alloc address is 0x400000

BOLT-INFO: creating new program header table at address 0x4600000, offset 0x4200000

BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.

BOLT-INFO: enabling relocation mode

BOLT-INFO: forcing -jump-tables=move for instrumentation

BOLT-WARNING: 14 collisions detected while hashing binary objects. Use -v=1 to see the list.

BOLT-INFO: number of removed linker-inserted veneers: 0

BOLT-INFO: 0 out of 40985 functions in the binary (0.0%) have non-empty execution profile

BOLT-INSTRUMENTER: Number of indirect call site descriptors: 10117

BOLT-INSTRUMENTER: Number of indirect call target descriptors: 40048

BOLT-INSTRUMENTER: Number of function descriptors: 39960

BOLT-INSTRUMENTER: Number of branch counters: 336327

BOLT-INSTRUMENTER: Number of ST leaf node counters: 247871

BOLT-INSTRUMENTER: Number of direct call counters: 66698

BOLT-INSTRUMENTER: Total number of counters: 650896

BOLT-INSTRUMENTER: Total size of counters: 5207168 bytes (static alloc memory)

BOLT-INSTRUMENTER: Total size of string table emitted: 2310032 bytes in file

BOLT-INSTRUMENTER: Total size of descriptors: 37711472 bytes in file

BOLT-INSTRUMENTER: Profile will be saved to file /tmp/prof.fdata

BOLT-INFO: removed 4975 empty blocks

BOLT-INFO: UCE removed 62903 blocks and 5076444 bytes of code

BOLT-INFO: Starting stub-insertion pass

BOLT-INFO: Inserted 0 stubs in the hot area and 0 stubs in the cold area. Shared 0 times, iterated 1 times.

llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:235: unsigned int {anonymous}::AArch64MCCodeEmitter::getMachineOpValue(const llvm::MCInst&, const llvm::MCOperand&, llvm::SmallVectorImplllvm::MCFixup&, const llvm::MCSubtargetInfo&) const: Assertion `MO.isImm() && "did not expect relocated expression"' failed.

Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var LLVM_SYMBOLIZER_PATH to point to it):

0 llvm-bolt 0x0000000000e58b10

1 llvm-bolt 0x0000000000e56560

2 linux-vdso.so.1 0x0000ffff8fbb9834 __kernel_rt_sigreturn + 0

3 libc.so.6 0x0000ffff8f6d8834 gsignal + 180

4 libc.so.6 0x0000ffff8f6da140 abort + 352

5 libc.so.6 0x0000ffff8f6d1780

6 libc.so.6 0x0000ffff8f6d17fc

7 llvm-bolt 0x0000000000835990

8 llvm-bolt 0x00000000008373e4

9 llvm-bolt 0x0000000000ca178c

10 llvm-bolt 0x000000000160e968

11 llvm-bolt 0x0000000001610680

12 llvm-bolt 0x0000000001610ed8

13 llvm-bolt 0x0000000000f35bb8

14 llvm-bolt 0x0000000000f38a18

15 llvm-bolt 0x000000000040b428

16 libc.so.6 0x0000ffff8f6c5da4 __libc_start_main + 228

17 llvm-bolt 0x0000000000486510

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.

Stack dump:

  1.  Program arguments: llvm-bolt postgres -instrument -o postgres-inst
    

zsh: abort (core dumped) llvm-bolt postgres -instrument -o postgres-inst

@EugeneZelenko EugeneZelenko added BOLT crash Prefer [crash-on-valid] or [crash-on-invalid] and removed new issue labels Feb 22, 2025
@jcgomezv
Copy link
Author

Interesting facts are that a similar version of the binary can be instrumented correctly with no crash and the data collected from there can be used to get good perf improvements.

The same code compiled in x86 can be instrumented correctly with no issues.

gcc10.5 is used for creating the binaries

@llvmbot
Copy link
Member

llvmbot commented Feb 22, 2025

@llvm/issue-subscribers-bolt

Author: None (jcgomezv)

BOLT-INFO: first alloc address is 0x400000

BOLT-INFO: creating new program header table at address 0x4600000, offset 0x4200000

BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.

BOLT-INFO: enabling relocation mode

BOLT-INFO: forcing -jump-tables=move for instrumentation

BOLT-WARNING: 14 collisions detected while hashing binary objects. Use -v=1 to see the list.

BOLT-INFO: number of removed linker-inserted veneers: 0

BOLT-INFO: 0 out of 40985 functions in the binary (0.0%) have non-empty execution profile

BOLT-INSTRUMENTER: Number of indirect call site descriptors: 10117

BOLT-INSTRUMENTER: Number of indirect call target descriptors: 40048

BOLT-INSTRUMENTER: Number of function descriptors: 39960

BOLT-INSTRUMENTER: Number of branch counters: 336327

BOLT-INSTRUMENTER: Number of ST leaf node counters: 247871

BOLT-INSTRUMENTER: Number of direct call counters: 66698

BOLT-INSTRUMENTER: Total number of counters: 650896

BOLT-INSTRUMENTER: Total size of counters: 5207168 bytes (static alloc memory)

BOLT-INSTRUMENTER: Total size of string table emitted: 2310032 bytes in file

BOLT-INSTRUMENTER: Total size of descriptors: 37711472 bytes in file

BOLT-INSTRUMENTER: Profile will be saved to file /tmp/prof.fdata

BOLT-INFO: removed 4975 empty blocks

BOLT-INFO: UCE removed 62903 blocks and 5076444 bytes of code

BOLT-INFO: Starting stub-insertion pass

BOLT-INFO: Inserted 0 stubs in the hot area and 0 stubs in the cold area. Shared 0 times, iterated 1 times.

llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:235: unsigned int {anonymous}::AArch64MCCodeEmitter::getMachineOpValue(const llvm::MCInst&, const llvm::MCOperand&, llvm::SmallVectorImpl<llvm::MCFixup>&, const llvm::MCSubtargetInfo&) const: Assertion `MO.isImm() && "did not expect relocated expression"' failed.

Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var LLVM_SYMBOLIZER_PATH to point to it):

0 llvm-bolt 0x0000000000e58b10

1 llvm-bolt 0x0000000000e56560

2 linux-vdso.so.1 0x0000ffff8fbb9834 __kernel_rt_sigreturn + 0

3 libc.so.6 0x0000ffff8f6d8834 gsignal + 180

4 libc.so.6 0x0000ffff8f6da140 abort + 352

5 libc.so.6 0x0000ffff8f6d1780

6 libc.so.6 0x0000ffff8f6d17fc

7 llvm-bolt 0x0000000000835990

8 llvm-bolt 0x00000000008373e4

9 llvm-bolt 0x0000000000ca178c

10 llvm-bolt 0x000000000160e968

11 llvm-bolt 0x0000000001610680

12 llvm-bolt 0x0000000001610ed8

13 llvm-bolt 0x0000000000f35bb8

14 llvm-bolt 0x0000000000f38a18

15 llvm-bolt 0x000000000040b428

16 libc.so.6 0x0000ffff8f6c5da4 __libc_start_main + 228

17 llvm-bolt 0x0000000000486510

PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.

Stack dump:

  1.  Program arguments: llvm-bolt postgres -instrument -o postgres-inst
    

zsh: abort (core dumped) llvm-bolt postgres -instrument -o postgres-inst

@paschalis-mpeis
Copy link
Member

@jcgomezv can you please add more details on the reproducer?

Which postgres version you used and how was the input binary compiled exactly?

I assume on the bolt side you only used llvm-bolt postgres -instrument -o postgres-inst.

@jcgomezv
Copy link
Author

Dear @paschalis-mpeis :

The postgres version is 17.2 and compiler versions are:

g++ version: aarch64-unknown-linux-gnu-g++ (GCC) 10.5.0
c++ version: aarch64-unknown-linux-gnu-g++ (GCC) 10.5.0
cpp version: cpp (GCC) 10.5.0
gcc version: aarch64-unknown-linux-gnu-gcc (GCC) 10.5.0
make version: GNU Make 4.4.1
python version: Python 3.9.19
perl version: (v5.30.3)

And that is correct I only got to the instrumentation phase as if this fails I could not collect perf data.

-Wl,--emit-relocs was used in link phase
and -fno-reorder-blocks-and-partition was used for C compiler

I can collect full logs for the bolt run with -v if that helps, or I can recompile bolt tool with any print outs that may help diagnosing this

@jcgomezv
Copy link
Author

BOLT-INFO: Starting pass: clean-mc-state
BOLT-INFO: Finished pass: clean-mc-state
llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:235: unsigned int {anonymous}::AArch64MCCodeEmitter::getMachineOpValue(const llvm::MCInst&, const llvm::MCOperand&, llvm::SmallVectorImplllvm::MCFixup&, const llvm::MCSubtargetInfo&) const: Assertion `MO.isImm() && "did not expect relocated expression"' failed.

Thread 1 "llvm-bolt" received signal SIGABRT, Aborted.
0x0000fffff7b1b834 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install libgcc-7.3.1-17.amzn2.aarch64 libstdc++-7.3.1-17.amzn2.aarch64 zlib-1.2.7-19.amzn2.0.3.aarch64
(gdb) where
#0 0x0000fffff7b1b834 in raise () from /lib64/libc.so.6
#1 0x0000fffff7b1d140 in abort () from /lib64/libc.so.6
#2 0x0000fffff7b14780 in __assert_fail_base () from /lib64/libc.so.6
#3 0x0000fffff7b147fc in __assert_fail () from /lib64/libc.so.6
#4 0x0000000000835990 in (anonymous namespace)::AArch64MCCodeEmitter::getMachineOpValue(llvm::MCInst const&, llvm::MCOperand const&, llvm::SmallVectorImplllvm::MCFixup&, llvm::MCSubtargetInfo const&) const [clone .constprop.0] ()
#5 0x00000000008373e4 in (anonymous namespace)::AArch64MCCodeEmitter::encodeInstruction(llvm::MCInst const&, llvm::SmallVectorImpl&, llvm::SmallVectorImplllvm::MCFixup&, llvm::MCSubtargetInfo const&) const ()
#6 0x0000000000ca178c in llvm::MCELFStreamer::emitInstToData(llvm::MCInst const&, llvm::MCSubtargetInfo const&) [clone .localalias] ()
#7 0x000000000160e968 in (anonymous namespace)::BinaryEmitter::emitFunction(llvm::bolt::BinaryFunction&, llvm::bolt::FunctionFragment&) ()
#8 0x0000000001610680 in (anonymous namespace)::BinaryEmitter::emitFunctions()::{lambda(std::vector<llvm::bolt::BinaryFunction*, std::allocatorllvm::bolt::BinaryFunction* > const&)#1}::operator()(std::vector<llvm::bolt::BinaryFunction*, std::allocatorllvm::bolt::BinaryFunction* > const&) const ()
#9 0x0000000001610ed8 in llvm::bolt::emitBinaryContext(llvm::MCStreamer&, llvm::bolt::BinaryContext&, llvm::StringRef) ()
#10 0x0000000000f35bb8 in llvm::bolt::RewriteInstance::emitAndLink() [clone .localalias] ()
#11 0x0000000000f38a18 in llvm::bolt::RewriteInstance::run() ()
#12 0x000000000040b428 in main ()

@jcgomezv
Copy link
Author

I instrumented the bolt code to issue out the opcode and see where we break and here is what I got:

AArch64MCCodeEmitter::encodeInstruction(opcode=4788
AArch64MCCodeEmitter::encodeInstruction(opcode=5569
AArch64MCCodeEmitter::encodeInstruction(opcode=7311
llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:236: unsigned int {anonymous}::AArch64MCCodeEmitter::getMachineOpValue(const llvm::MCInst&, const llvm::MCOperand&, llvm::SmallVectorImplllvm::MCFixup&, const llvm::MCSubtargetInfo&) const: Assertion `MO.isImm() && "did not expect relocated expression"' failed.

But such instruction was processes successfully before...

@jcgomezv
Copy link
Author

And from gdb:

#5 0x0000000000a55858 in (anonymous namespace)::AArch64MCCodeEmitter::getBinaryCodeForInstr (this=0x23648d90,
MI=..., Fixups=..., STI=...)
at /home/gomezjc/OpenSource/build/lib/Target/AArch64/AArch64GenMCCodeEmitter.inc:17778
17778 op = getMachineOpValue(MI, MI.getOperand(4), Fixups, STI);
(gdb) info locals
msg = ""
Msg = {llvm::raw_ostream = {_vptr.raw_ostream = 0x0, Kind = llvm::raw_ostream::OStreamKind::OK_OStream,
OutBufStart = 0x0, OutBufEnd = 0x0, OutBufCur = 0x0, ColorEnabled = false,
BufferMode = llvm::raw_ostream::BufferKind::Unbuffered, static BLACK = llvm::raw_ostream::Colors::BLACK,
static RED = llvm::raw_ostream::Colors::RED, static GREEN = llvm::raw_ostream::Colors::GREEN,
static YELLOW = llvm::raw_ostream::Colors::YELLOW, static BLUE = llvm::raw_ostream::Colors::BLUE,
static MAGENTA = llvm::raw_ostream::Colors::MAGENTA, static CYAN = llvm::raw_ostream::Colors::CYAN,
static WHITE = llvm::raw_ostream::Colors::WHITE,
static BRIGHT_BLACK = llvm::raw_ostream::Colors::BRIGHT_BLACK,
static BRIGHT_RED = llvm::raw_ostream::Colors::BRIGHT_RED,
static BRIGHT_GREEN = llvm::raw_ostream::Colors::BRIGHT_GREEN,
static BRIGHT_YELLOW = llvm::raw_ostream::Colors::BRIGHT_YELLOW,
static BRIGHT_BLUE = llvm::raw_ostream::Colors::BRIGHT_BLUE,
static BRIGHT_MAGENTA = llvm::raw_ostream::Colors::BRIGHT_MAGENTA,
static BRIGHT_CYAN = llvm::raw_ostream::Colors::BRIGHT_CYAN,
static BRIGHT_WHITE = llvm::raw_ostream::Colors::BRIGHT_WHITE,
static SAVEDCOLOR = llvm::raw_ostream::Colors::SAVEDCOLOR, static RESET = llvm::raw_ostream::Colors::RESET},
OS = "\300\331\377\377\377\377\000\000$\240\v\001\000\000\000\000 +\223~\000\000\000\000 +\223~\000\000\000\000\340\370\246\033\000\000\000\000X\304\001\001\000\000\000\000 +\223~\000\000\000\000 \362\274\a\000\000\000\000@"\000\250\377\377\000\000 +\223~\000\000\000\000 +\223~\000\000\000\000xgi\031\000\000\000\000\200\331\377\377\377\377\000\000\364\263\001\001", '\000' <repeats 12 times>, "xgi\031\000\000\000\000\240\331\377\377\377\377\000\000ح\001\001", '\000' <repeats 20 times>, "\300\331\377\377\377\377\000\000pgi\031\000\000\000\000@"\000\250\377\377\000\000\340\036ج\000\000\000\000\240\332\377\377\377\377\000\000"...}
InstBits = <error reading variable InstBits (value requires 70704 bytes, which is more than max-value-size)>
opcode = 7311
Value = 2843769853
op = 992

@jcgomezv
Copy link
Author

(gdb) p MI
$1 = (const llvm::MCInst &) @0xffffa8002240: {Opcode = 7311, Flags = 0, Loc = {Ptr = 0x0},
Operands = {<llvm::SmallVectorImplllvm::MCOperand> = {<llvm::SmallVectorTemplateBase<llvm::MCOperand, true>> = {<llvm::SmallVectorTemplateCommon<llvm::MCOperand, void>> = {<llvm::SmallVectorBase> = {
BeginX = 0xffffa8002260, Size = 5, Capacity = 6}, },
static TakesParamByValue = true}, }, <llvm::SmallVectorStorage<llvm::MCOperand, 6>> = {
InlineElts = "\001\333\377\377\b\000\000\000\b\000\000\000\000\000\000\000\001\000\000\000\002\000\000\000\002\000\000\000\000\000\000\000\001\000\000\000\006\000\000\000\006\000\000\000\000\000\000\000\001\000\000\000\b\000\000\000\b\000\000\000\000\000\000\000\005\337\377\377\377\377\000\000`\371\246\033", '\000' <repeats 19 times>}, }}

@jcgomezv
Copy link
Author

(gdb) p *this
$2 = {llvm::MCCodeEmitter = {
_vptr.MCCodeEmitter = 0x76cd818 <vtable for (anonymous namespace)::AArch64MCCodeEmitter+16>},
Ctx = @0x7bd60c0}
(gdb) p MCNumEmitted
$3 = {DebugType = 0x411d3a0 "mccodeemitter", Name = 0x411d3b0 "MCNumEmitted",
Desc = 0x411d3c0 "Number of MC instructions emitted.", Value = {<std::__atomic_base> = {
static _S_alignment = 8, _M_i = 15}, static is_always_lock_free = true}, Initialized = {_M_base = {
static _S_alignment = 1, _M_i = true}, static is_always_lock_free = true}}
(gdb)

@jcgomezv
Copy link
Author

(gdb) p Fixups
$4 = (llvm::SmallVectorImplllvm::MCFixup &) @0xacd81f40: {<llvm::SmallVectorTemplateBase<llvm::MCFixup, true>> = {<llvm::SmallVectorTemplateCommon<llvm::MCFixup, void>> = {<llvm::SmallVectorBase> = {
BeginX = 0xacd81f50, Size = 0, Capacity = 4}, },
static TakesParamByValue = false}, }
(gdb) p STI
$5 = (const llvm::MCSubtargetInfo &) @0x7bcf220: {
_vptr.MCSubtargetInfo = 0x77fe740 <vtable for llvm::AArch64GenMCSubtargetInfo+16>, TargetTriple = {
Data = "aarch64--linux", Arch = llvm::Triple::aarch64, SubArch = llvm::Triple::ARMSubArch_v8,
Vendor = llvm::Triple::UnknownVendor, OS = llvm::Triple::Linux,
Environment = llvm::Triple::UnknownEnvironment, ObjectFormat = llvm::Triple::ELF}, CPU = "generic",
TuneCPU = "generic", ProcNames = {Data = 0x78c7e00 llvm::AArch64Names, Length = 89}, ProcFeatures = {
Data = 0x75cfb50 llvm::AArch64FeatureKV, Length = 310}, ProcDesc = {
Data = 0x76c78c8 llvm::AArch64SubTypeKV, Length = 72},
WriteProcResTable = 0x40babf8 llvm::AArch64WriteProcResTable,
WriteLatencyTable = 0x40beb80 llvm::AArch64WriteLatencyTable,
ReadAdvanceTable = 0x40bf2b0 llvm::AArch64ReadAdvanceTable,
CPUSchedModel = 0x76c4198 llvm::CortexA510Model, Stages = 0x0, OperandCycles = 0x0, ForwardingPaths = 0x0,
FeatureBits = {Bits = {_M_elems = {1688849860264192, 288232575174971906, 0, 524288, 0}}},
FeatureString = "+all"}

@jcgomezv
Copy link
Author

I changed the code in Bolt to display more about what is the failed instruction by using MInstrInfo and got this:

AArch64MCCodeEmitter::encodeInstruction(opcode=7311)
AArch64MCCodeEmitter::encodeInstruction(instruction=STPXPRE)
llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:237: unsigned int {anonymous}::AArch64MCCodeEmitter::getMachineOpValue(const llvm::MCInst&, const llvm::MCOperand&, llvm::SmallVectorImplllvm::MCFixup&, const llvm::MCSubtargetInfo&) const: Assertion MO.isImm() && "did not expect relocated expression"' failed. Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var LLVM_SYMBOLIZER_PATH` to point to it):
0 llvm-bolt 0x0000000001467f28
1 llvm-bolt 0x00000000014682bc
2 llvm-bolt 0x0000000001465e88
3 llvm-bolt 0x0000000001467880
4 linux-vdso.so.1 0x0000ffffba60c834 __kernel_rt_sigreturn + 0
5 libc.so.6 0x0000ffffba12b834 gsignal + 180
6 libc.so.6 0x0000ffffba12d140 abort + 352

The instruction is expected to have the following format but some how my Imm value is missing as we are parsing the instruction

STPXPRE Xt1, Xt2, [Xn, #imm]

@jcgomezv
Copy link
Author

More details for this encoding where we fail, I see five operands but this instruction takes 4:

(gdb) p MI.Operands.size()
$8 = 5
(gdb) p MI.Operands[0]
$9 = (llvm::MCOperand &) @0xffffa8002260: {Kind = llvm::MCOperand::kRegister, {RegVal = 8, ImmVal = 8, SFPImmVal = 8, FPImmVal = 8, ExprVal = 0x8,
InstVal = 0x8}}
(gdb) p MI.Operands[1]
$10 = (llvm::MCOperand &) @0xffffa8002270: {Kind = llvm::MCOperand::kRegister, {RegVal = 2, ImmVal = 2, SFPImmVal = 2, FPImmVal = 2, ExprVal = 0x2,
InstVal = 0x2}}
(gdb) p MI.Operands[2]
$11 = (llvm::MCOperand &) @0xffffa8002280: {Kind = llvm::MCOperand::kRegister, {RegVal = 6, ImmVal = 6, SFPImmVal = 6, FPImmVal = 6, ExprVal = 0x6,
InstVal = 0x6}}
(gdb) p MI.Operands[3]
$12 = (llvm::MCOperand &) @0xffffa8002290: {Kind = llvm::MCOperand::kRegister, {RegVal = 8, ImmVal = 8, SFPImmVal = 8, FPImmVal = 8, ExprVal = 0x8,
InstVal = 0x8}}
(gdb) p MI.Operands[4]
$13 = (llvm::MCOperand &) @0xffffa80022a0: {Kind = llvm::MCOperand::kExpr, {RegVal = 463927696, ImmVal = 463927696, SFPImmVal = 463927696,
FPImmVal = 463927696, ExprVal = 0x1ba6f990, InstVal = 0x1ba6f990}}
(gdb) p MI
$14 = (const llvm::MCInst &) @0xffffa8002240: {Opcode = 7311, Flags = 0, Loc = {Ptr = 0x0},
Operands = {<llvm::SmallVectorImplllvm::MCOperand> = {<llvm::SmallVectorTemplateBase<llvm::MCOperand, true>> = {<llvm::SmallVectorTemplateCommon<llvm::MCOperand, void>> = {<llvm::SmallVectorBase> = {BeginX = 0xffffa8002260, Size = 5, Capacity = 6}, },
static TakesParamByValue = true}, }, <llvm::SmallVectorStorage<llvm::MCOperand, 6>> = {
InlineElts = "\001\333\377\377\b\000\000\000\b\000\000\000\000\000\000\000\001\000\000\000\002\000\000\000\002\000\000\000\000\000\000\000\001\000\000\000\006\000\000\000\006\000\000\000\000\000\000\000\001\000\000\000\b\000\000\000\b\000\000\000\000\000\000\000\005\337\377\377\377\377\000\000\220\371\246\033", '\000' <repeats 19 times>}, }}

@ilinpv
Copy link
Contributor

ilinpv commented Feb 25, 2025

@jcgomezv could you try llvm-bolt with/without "-instrument", check dissasembly and see if errors occur. llvm-bolt postgres -o postgres-inst --print-disasm -v 2 > postgres-disasm.log

@jcgomezv
Copy link
Author

BOLT-INFO: setting size of function modf@PLT to 16 (was 0)
BOLT-INFO: setting size of function tzset@PLT to 16 (was 0)
BOLT-INFO: setting size of function localtime_r@PLT to 16 (was 0)
BOLT-INFO: setting size of function _start to 72 (was 0)
BOLT-INFO: setting size of function crc32_iscsi_refl_pmull to 48 (was 0)
BOLT-INFO: setting size of function _fini to 16 (was 0)
Binary Function "_init" after disassembly {
Number : 1
State : disassembled
Address : 0x6558c0
Size : 0x14
MaxSize : 0x14
Offset : 0x2558c0
Section : .init
Orc Section : .local.text._init
LSDA : 0x0
IsSimple : 1
IsMultiEntry: 0
IsSplit : 0
BB Count : 0
}
.LBB00:
00000000: stp x29, x30, [sp, #-0x10]!
00000004: mov x29, sp
00000008: bl "call_weak_fn/1" # Offset: 8
0000000c: ldp x29, x30, [sp], #0x10
00000010: ret # Offset: 16
DWARF CFI Instructions:

End of Function "_init"

Binary Function "brin_bloom_summary_in" after disassembly {
Number : 2
State : disassembled
Address : 0x659c00
Size : 0x58
MaxSize : 0x58
Offset : 0x259c00
Section : .text
Orc Section : .local.text.brin_bloom_summary_in
LSDA : 0x0
IsSimple : 1
IsMultiEntry: 0
IsSplit : 0
BB Count : 0
}
.LBB01:
00000000: stp x29, x30, [sp, #

@jcgomezv
Copy link
Author

It crashes at the same place and it tell me what the instruction and it breaks right when it is missing the immediate arg:

BOLT-WARNING: writable reference into the middle of the function ExecInterpExpr/1(*2) detected at address 0x35a3678
llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/include/llvm/MC/MCInst.h:82: int64_t llvm::MCOperand::getImm() const: Assertion isImm() && "This is not an immediate"' failed. Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var LLVM_SYMBOLIZER_PATH` to point to it):
0 llvm-bolt 0x0000000001468090
1 llvm-bolt 0x0000000001468424
2 llvm-bolt 0x0000000001465ff0
3 llvm-bolt 0x00000000014679e8
4 linux-vdso.so.1 0x0000ffffa4dc9834 __kernel_rt_sigreturn + 0
5 libc.so.6 0x0000ffffa48e8834 gsignal + 180
6 libc.so.6 0x0000ffffa48ea140 abort + 352
7 libc.so.6 0x0000ffffa48e1780
8 libc.so.6 0x0000ffffa48e17fc
9 llvm-bolt 0x00000000009c4cb8
10 llvm-bolt 0x00000000009edb0c
11 llvm-bolt 0x00000000009d9d58
12 llvm-bolt 0x00000000009e54c8
13 llvm-bolt 0x000000000202ca08
14 llvm-bolt 0x0000000002076d18
15 llvm-bolt 0x00000000015b68ec
16 llvm-bolt 0x00000000015a8d18
17 llvm-bolt 0x000000000040f680
18 libc.so.6 0x0000ffffa48d5da4 __libc_start_main + 228
19 llvm-bolt 0x000000000040e590
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:

@jcgomezv
Copy link
Author

More info on the specific place where we crash:

AArch64MCCodeEmitter::encodeInstruction(opcode=7311)
AArch64MCCodeEmitter::encodeInstruction(Operands.size=5)
AArch64MCCodeEmitter::encodeInstruction(instruction=STPXPRE)
AArch64MCCodeEmitter::getMachineOpValue(MO.isReg()=1)
AArch64MCCodeEmitter::getMachineOpValue(MO.isImm()=0)
AArch64MCCodeEmitter::getMachineOpValue(regencoding=29)
AArch64MCCodeEmitter::getMachineOpValue(MO.isReg()=1)
AArch64MCCodeEmitter::getMachineOpValue(MO.isImm()=0)
AArch64MCCodeEmitter::getMachineOpValue(regencoding=30)
AArch64MCCodeEmitter::getMachineOpValue(MO.isReg()=1)
AArch64MCCodeEmitter::getMachineOpValue(MO.isImm()=0)
AArch64MCCodeEmitter::getMachineOpValue(regencoding=31)
AArch64MCCodeEmitter::getMachineOpValue(MO.isReg()=0)
AArch64MCCodeEmitter::getMachineOpValue(MO.isImm()=0)
llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:241: unsigned int {anonymous}::AArch64MCCodeEmitter::getMachineOpValue(const llvm::MCInst&, const llvm::MCOperand&, llvm::SmallVectorImpl<llvm::MCFixup>&, const llvm::MCSubtargetInfo&) const: Assertion `MO.isImm() && "did not expect relocated expression"' failed.

Thread 1 "llvm-bolt" received signal SIGABRT, Aborted.
0x0000fffff7b1b834 in raise () from /lib64/libc.so.6
(gdb) up
#1  0x0000fffff7b1d140 in abort () from /lib64/libc.so.6
(gdb) up
#2  0x0000fffff7b14780 in __assert_fail_base () from /lib64/libc.so.6
(gdb) up
#3  0x0000fffff7b147fc in __assert_fail () from /lib64/libc.so.6
(gdb) up
#4  0x0000000000a45150 in (anonymous namespace)::AArch64MCCodeEmitter::getMachineOpValue (this=0x1ab74870, MI=..., MO=..., Fixups=..., STI=...)
    at /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:241
241       assert(MO.isImm() && "did not expect relocated expression");
(gdb) p MI
$1 = (const llvm::MCInst &) @0xffffa8002240: {Opcode = 7311, Flags = 0, Loc = {Ptr = 0x0}, 
  Operands = {<llvm::SmallVectorImpl<llvm::MCOperand>> = {<llvm::SmallVectorTemplateBase<llvm::MCOperand, true>> = {<llvm::SmallVectorTemplateCommon<llvm::MCOperand, void>> = {<llvm::SmallVectorBase<unsigned int>> = {BeginX = 0xffffa8002260, Size = 5, Capacity = 6}, <No data fields>}, 
        static TakesParamByValue = true}, <No data fields>}, <llvm::SmallVectorStorage<llvm::MCOperand, 6>> = {
      InlineElts = "\001\333\377\377\b\000\000\000\b\000\000\000\000\000\000\000\001\000\000\000\002\000\000\000\002\000\000\000\000\000\000\000\001\000\000\000\006\000\000\000\006\000\000\000\000\000\000\000\001\000\000\000\b\000\000\000\b\000\000\000\000\000\000\000\005\337\377\377\377\377\000\000\220\371\246\033", '\000' <repeats 19 times>}, <No data fields>}}
(gdb) p MI.Operands[1].isReg()
$2 = true
(gdb) p MI.Operands[2].isReg()
$3 = true
(gdb) p MI.Operands[3].isReg()
$4 = true
(gdb) p MI.Operands[4].isReg()
$5 = false
(gdb) p MI.Operands[4].isImm()
$6 = false
(gdb) p MI.Operands[4]
$7 = (llvm::MCOperand &) @0xffffa80022a0: {Kind = llvm::MCOperand::kExpr, {RegVal = 463927696, ImmVal = 463927696, SFPImmVal = 463927696, 
    FPImmVal = 463927696, ExprVal = 0x1ba6f990, InstVal = 0x1ba6f990}}
(gdb) p MI.Operands[3]
$8 = (llvm::MCOperand &) @0xffffa8002290: {Kind = llvm::MCOperand::kRegister, {RegVal = 8, ImmVal = 8, SFPImmVal = 8, FPImmVal = 8, ExprVal = 0x8, 
    InstVal = 0x8}}
(gdb) 

Note the 4th operand seems corrupt

@ilinpv
Copy link
Contributor

ilinpv commented Feb 25, 2025

Yes, it looks like it broke on "00000000: stp x29, x30, [sp, #smth corrupt]" in brin_bloom_summary_in function. As workaround you can try to skip it with llvm-bolt --skip-funcs=brin_bloom_summary_in

@jcgomezv
Copy link
Author

Hi Pavel: Thanks for the suggestion let me try this and I will report back.

@jcgomezv
Copy link
Author

That did the trick, thanks Pavel. I was able to generate an instrumented binary with the info you provided. Hopefully the debug info here will help you guys solve the bug!

@jcgomezv
Copy link
Author

So seems like the instrumented file I get after this fails to start breaks in instruction LDR X1, [X1, #3688]. Any idea of what else we can try here?

@jcgomezv
Copy link
Author

The actual crash in the program is within the atexit() system call

@jcgomezv
Copy link
Author

Image

Image

@jcgomezv
Copy link
Author

Here is the mappings for the process memory showing that the address above is in fact out of range:

Image

@peterwaller-arm
Copy link
Contributor

Hi @jcgomezv,

Thanks so much for your report and for providing detailed information. Could you please update the original issue with info about your platform or distro? It's difficult to reproduce the issue without that context. It is also helps future triage.

Also, it would be really helpful to know how you’re obtaining your PostgreSQL binary. Providing enough details for someone to reproduce the setup would be ideal. You said "Interesting facts are that a similar version of the binary can be instrumented correctly with no crash" - what is different about this similar binary?

Thanks again for your help!

@ilinpv
Copy link
Contributor

ilinpv commented Feb 26, 2025

It looks like the correctness of the binary may have been broken by instrumentation. @yota9 @yavtuk if you have any suggestions how to debug that further that would be really appreciated.

@ilinpv
Copy link
Contributor

ilinpv commented Mar 4, 2025

Are you building with -fPIE?

@jcgomezv
Copy link
Author

jcgomezv commented Mar 4, 2025

Also the only way to generate code with that step for -O2 or -O3 compile is to use the work around provided earlier ( --skip-funcs=brin_bloom_summary_in). The binary in this case usually dies at the start while executing calls to atexit() and when we look at the assembly it seems like it is trying to access out of range memory (this code is very similar to opensource):

0 0x00000000047863e4 in atexit () at ../../../../src/include/port/atomics.h:440
#1 0x00000000047142dc in on_proc_exit (function=function@entry=0xc9496c, arg=arg@entry=0) at ipc.c:367
#2 0x000000000488919c in CreateLockFile (filename=0x137bd00 "postmaster.pid", amPostmaster=true, socketDir=0x134c028 "", isDDLock=true,
refName=0xfffff6053e60 "/home/gomezjc/workplace/Manfred17-2-Stable/build/RDSManfredDev/RDSManfredDev-Development/AL2_aarch64/DEV.STD.PTHREAD/build/RDSManfred/db/data") at miscinit.c:1750
#3 0x0000000004819144 in PostmasterMain (argc=argc@entry=5, argv=argv@entry=0xfffff6028c80) at postmaster.c:1198
#4 0x000000000488d8a8 in main (argc=5, argv=0xfffff6028c80) at main.c:244

. Also during optmization I see a lot of these messages:

BOLT-WARNING: empty location list detected at 0x3f5d510 for DIE at 0x0xcc1ca2a0 in CU at 0xa950413
BOLT-WARNING: empty location list detected at 0x3f5d535 for DIE at 0x0xcc1ca980 in CU at 0xa950413
BOLT-WARNING: empty location list detected at 0x3f5d55a for DIE at 0x0xcc1cab40 in CU at 0xa950413
BOLT-WARNING: empty location list detected at 0x3f5d5f9 for DIE at 0x0xcc1cb8c0 in CU at 0xa950413

@jcgomezv
Copy link
Author

jcgomezv commented Mar 4, 2025

Are you building with -fPIE?

Let me check, we do have a large number of flags

@jcgomezv
Copy link
Author

jcgomezv commented Mar 4, 2025

I confirmed we do not use -fPIE

@peterwaller-arm
Copy link
Contributor

peterwaller-arm commented Mar 5, 2025

Some ideas:

  • It looks like the binary been statically linked, is that true? (Of both cases, the improved case and the regressed case?)
  • Which libc (and version) are you using?
  • Can you provide a disassembly in atexit() prior to instrumentation with bolt using llvm-objdump -dr? Along with the mappings for that binary, so we can see what the bad adr looks like prior to rewriting.
  • Can you try to build/link with -fPIE?

@jcgomezv
Copy link
Author

jcgomezv commented Mar 5, 2025

@peterwaller-arm let me work through these suggestions/questions:

1.-Our Postgres binary is built statically but it does uses some dynamic libraries.
2.-For this build, I am using (GNU libc) 2.26
3.-For the disassembly of atexit before, yes I will fetch that and while at it I will also get the mappings for that binary
4.-I will pass -fPIE to compiler and linker and see what we get from that.

A few data points that I I am not sure I mentioned before if I compile without optimizations I can instrument the binary and collect perf data via the binary: I can generate an optimized binary given that data but the binary will still not work. So seems like some gcc optmization is breaking bolt, but still the generated binary will not work (I have to dig deeper on the break here, but it is not bad atexit as before it is something else)

Ofcourse being able to optimize a non optimized version it is a no-go, for me if I remove O2 my perf some some workload falls about 40%...:-(

@jcgomezv
Copy link
Author

jcgomezv commented Mar 5, 2025

@peterwaller-arm We already use -fPIC in the code: is this significantly different from -fPIE?

@ilinpv
Copy link
Contributor

ilinpv commented Mar 6, 2025

@jcgomezv -fPIC is even stronger. But seems your binary is not fully relocatable, you have .fini_array non-relocatable and "ELF 64-bit LSB executable" on file command. For relocatable executable IIRC you should get "LSB shared object" there

@paschalis-mpeis
Copy link
Member

paschalis-mpeis commented Mar 6, 2025

@jcgomezv -fPIC wouldn't make your final binary a PIE one, unless it's somehow toggled on in your build system.
So you have to have position independence for all your intermediate objects (PIC) but also to your final binary (PIE).

Can always verify whether your binary is a pie using something like:

file BINARY | grep pie

For binaries as in your case, you should get:

.. ELF 64-bit LSB pie executable ..

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

Here is the full section dumps for all the execs I generated (postgress is relocatable exec that works fine, inst-clear is bolt instrumented binary which runs fine, the other two are optimized either with perf data or bolt data they do not run crash just after start)

postgres-inst-clear.txt
postgres-perf-nl.txt
postgres.txt
postgress-inst-1k-t.txt

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

@jcgomezv -fPIC is even stronger. But seems your binary is not fully relocatable, you have .fini_array non-relocatable and "ELF 64-bit LSB executable" on file command. For relocatable executable IIRC you should get "LSB shared object" there

The strange thing is that the binaries that work still are ELF and have similar rela sections: here is compare of opensource postgres vs our version:

Image

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

@peterwaller-arm I did experiment with -fPIE but outcome is the same: I have to give the --skip-funcs=brin_bloom_summary_in to bypass the bug described above and the binary generated still crashes at atexit. I have collected the info for the crashing and working version as requested and I can tell that the relocation part was done incorrectly:

Image

Image

Image

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

@jcgomezv -fPIC wouldn't make your final binary a PIE one, unless it's somehow toggled on in your build system. So you have to have position independence for all your intermediate objects (PIC) but also to your final binary (PIE).

Can always verify whether your binary is a pie using something like:

file BINARY | grep pie
For binaries as in your case, you should get:

.. ELF 64-bit LSB pie executable ..

Now I get:
file postgres
postgres: ELF 64-bit LSB shared object, ARM aarch64, version 1 (GNU/Linux), dynamically linked (uses shared libs), for GNU/Linux 3.7.0, not stripped

Ok I was able to figure out why I did not see sharable object: although compile was being done with -fPIE linker was not using -pie. Once I added this to the linking the object shows sharable object but still bolt breaks in both modes instrumentation and optimization. And now I am not even able to generate a instrumented binary as I get the following error:

BOLT-INFO: setting _end to 0x705387c
BOLT-INFO: setting _end to 0x705387c
BOLT-INFO: setting __bolt_runtime_start to 0x6fa88c8
BOLT-INFO: setting __bolt_runtime_fini to 0x6fa8960
BOLT-INFO: setting __hot_start to 0x4400000
BOLT-INFO: setting __hot_end to 0x64adcc0
BOLT-ERROR: unable to get new address corresponding to input address 0x5fab20 in function ExecInterpExpr/1(*2). Consider adding this function to --skip-funcs=...

If I just try --skip-funct=ExecInterpExpr it still breaks with the same message

Any ideas?

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

Oh never mind the question about skip-funct I see the code uses reg expressions I will try that....

@ilinpv
Copy link
Contributor

ilinpv commented Mar 6, 2025

@jcgomezv can you try to skip with llvm-bolt --skip-funcs=ExecInterpExpr/1 Also worth to check if you got related symptoms #120992 (comment) with system libraries ( like libgcc.a and crtbegin.o ) built with pointer authentication enabled.

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

Oh yes, this other bug sounds like that...let me push forward hopefully binaries that I generate now work :-)

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

Just tried the generated binaries they both crash at the same point as before atexit() :-( if I skip atexit I still break elsewhere, have to investigate that new failure...

@ilinpv
Copy link
Contributor

ilinpv commented Mar 6, 2025

If you apply BOLT patch #120267 ( hopefully we get it merged soon ) you won’t need to skip ExecInterpExpr/1

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

O interestingly enough if I skip function all the functions on that other issue and at exit, then I get a binary that seems to work. And yes I noticed that that function above had a goto inside...let me pray see if I can opmtize binary :-)
__

@jcgomezv
Copy link
Author

jcgomezv commented Mar 6, 2025

I used this to get a working instrumented version:llvm-bolt postgres -update-debug-sections --skip-funcs=brin_bloom_summary_in -instrument -o postgres-inst-clear --skip-funcs="ExecInterpExpr/1,__do_global_dtors_aux/1,init_have_lse_atomics/1,atexit/1"

@ilinpv
Copy link
Contributor

ilinpv commented Mar 6, 2025

I recognize that we need more self-diagnostics and correctness checks within BOLT to enhance the user experience. This issue could serve as a valuable example for such work, thank you for detailed reports!

@jcgomezv
Copy link
Author

jcgomezv commented Mar 7, 2025

I do appreciate all your help: I am eager to get this working because I see what we can gain from the tests with opensource postgres, I think we are one step away from a good optimized binary 🙏

@jcgomezv
Copy link
Author

I just wanted to update this issue statin that I was able to optimize our binary by following the workarounds offered earlier, i.e. bypassing the processing of a number of functions that bolt optimizer does not support very well in ARM and also using the PIE compiling and not PIC as I was using earlier. The gains so far are very small and depend heavily on perf collection. I.e. if I collect perf for a single threaded client test then the perf gains are better but if the load is more noisy coming from multiple clients then the gains are less. If perf is collected under large number of clients then the perf information does not help optmize and at times results in a slower binary :-(

@paschalis-mpeis
Copy link
Member

Hey @jcgomezv,

A couple of weeks ago you mentioned that you got instrumentation working. Were the numbers better?

Running perf stat on the L1 i-cache before and after optimization could help determine whether your workload setup benefits from bolt and whether bolt is making improvements. It's also possible that you need to record more samples.
Additionally, BOLT's output (including dyno-stats) could provide some insight.

@jcgomezv
Copy link
Author

@paschalis-mpeis yes, I believe instrumented version is better than using perf collection but yet I am puzzled about how the instrumented version works, here is what I see and may be you can help me improve the data collection using the instrumented version: I noticed that the /tmp/prof.fdata gets created early in the process and then it does not change much during the testing and in our case the test workload last long after the start of the process. Is there as way to trigger collection on the instrumented binary from a starting point where the load has already 'warmed the process'?

@paschalis-mpeis
Copy link
Member

About the instrumentation issue, since postgres is a service you could use:

--instrumentation-no-counters-clear --instrumentation-sleep-time=<uint>  

To avoid including the warm-up process in instrumentation, you can:

A) Capture a perf profile that excludes warm-up code:

There are a few ways to do this. You can use perf record --delay NUM, or if you can programmatically determine when the database is ready (e.g., by monitoring logs), you can trigger your benchmarking harness along with perf record at that point.

B) Use that profile for selective instrumentation:

Once you have the profile, you can combine it with --instrument-hot-only. Since you work with a service, you’ll need to use the counters/sleep flags mentioned above.


I see that you’ve closed this issue, as it’s now more of a discussion around profiling. Feel free to follow up here or in the #bolt Discord channel.

@jcgomezv
Copy link
Author

@paschalis-mpeis are the following flags passed to the instrumented binary or during the instrumentation via bolt? --instrumentation-no-counters-clear --instrumentation-sleep-time=

@paschalis-mpeis
Copy link
Member

@jcgomezv these are flags passed during instrumentation via llvm-bolt, ie

llvm-bolt ... --instrument \
  --instrumentation-no-counters-clear \
  --instrumentation-sleep-time=N \
  -o postgres.instrumented

This flag may also be relevant --instrumentation-wait-forks.

Some useful entries from the manual:

$ llvm-bolt --help
  --instrument                                              - instrument code to generate accurate profile data
  --instrument-hot-only                                     - only insert instrumentation on hot functions (needs profile, default: false)
  --instrumentation-file-append-pid                         - append PID to saved profile file name (default: false)
  --instrumentation-no-counters-clear                       - Don't clear counters across dumps (use with instrumentation-sleep-time option)
  --instrumentation-sleep-time=<uint>                       - interval between profile writes (default: 0 = write only at program end).  This is usefu
l for service workloads when you want to dump profile every X minutes or if you are killing the program and the profile is not being dumped at the end
.
  --instrumentation-wait-forks                              - Wait until all forks of instrumented process will finish (use with instrumentation-sleep
-time option)

@jcgomezv
Copy link
Author

@paschalis-mpeis thanks a lot, yes found them in the instrumentation part of the manual. I also saw the section about the merge command: I need to merge perf for various loads to optimize for all of them, else I guess I am targeting one type of load...I will try these see if things improve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BOLT crash Prefer [crash-on-valid] or [crash-on-invalid]
Projects
None yet
Development

No branches or pull requests

7 participants