-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Crash while trying to instrument a binary in AArch64 (Assertion `MO.isImm() && "did not expect relocated expression"' #128282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Interesting facts are that a similar version of the binary can be instrumented correctly with no crash and the data collected from there can be used to get good perf improvements. The same code compiled in x86 can be instrumented correctly with no issues. gcc10.5 is used for creating the binaries |
@llvm/issue-subscribers-bolt Author: None (jcgomezv)
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: creating new program header table at address 0x4600000, offset 0x4200000 BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it. BOLT-INFO: enabling relocation mode BOLT-INFO: forcing -jump-tables=move for instrumentation BOLT-WARNING: 14 collisions detected while hashing binary objects. Use -v=1 to see the list. BOLT-INFO: number of removed linker-inserted veneers: 0 BOLT-INFO: 0 out of 40985 functions in the binary (0.0%) have non-empty execution profile BOLT-INSTRUMENTER: Number of indirect call site descriptors: 10117 BOLT-INSTRUMENTER: Number of indirect call target descriptors: 40048 BOLT-INSTRUMENTER: Number of function descriptors: 39960 BOLT-INSTRUMENTER: Number of branch counters: 336327 BOLT-INSTRUMENTER: Number of ST leaf node counters: 247871 BOLT-INSTRUMENTER: Number of direct call counters: 66698 BOLT-INSTRUMENTER: Total number of counters: 650896 BOLT-INSTRUMENTER: Total size of counters: 5207168 bytes (static alloc memory) BOLT-INSTRUMENTER: Total size of string table emitted: 2310032 bytes in file BOLT-INSTRUMENTER: Total size of descriptors: 37711472 bytes in file BOLT-INSTRUMENTER: Profile will be saved to file /tmp/prof.fdata BOLT-INFO: removed 4975 empty blocks BOLT-INFO: UCE removed 62903 blocks and 5076444 bytes of code BOLT-INFO: Starting stub-insertion pass BOLT-INFO: Inserted 0 stubs in the hot area and 0 stubs in the cold area. Shared 0 times, iterated 1 times. llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:235: unsigned int {anonymous}::AArch64MCCodeEmitter::getMachineOpValue(const llvm::MCInst&, const llvm::MCOperand&, llvm::SmallVectorImpl<llvm::MCFixup>&, const llvm::MCSubtargetInfo&) const: Assertion `MO.isImm() && "did not expect relocated expression"' failed. Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var 0 llvm-bolt 0x0000000000e58b10 1 llvm-bolt 0x0000000000e56560 2 linux-vdso.so.1 0x0000ffff8fbb9834 __kernel_rt_sigreturn + 0 3 libc.so.6 0x0000ffff8f6d8834 gsignal + 180 4 libc.so.6 0x0000ffff8f6da140 abort + 352 5 libc.so.6 0x0000ffff8f6d1780 6 libc.so.6 0x0000ffff8f6d17fc 7 llvm-bolt 0x0000000000835990 8 llvm-bolt 0x00000000008373e4 9 llvm-bolt 0x0000000000ca178c 10 llvm-bolt 0x000000000160e968 11 llvm-bolt 0x0000000001610680 12 llvm-bolt 0x0000000001610ed8 13 llvm-bolt 0x0000000000f35bb8 14 llvm-bolt 0x0000000000f38a18 15 llvm-bolt 0x000000000040b428 16 libc.so.6 0x0000ffff8f6c5da4 __libc_start_main + 228 17 llvm-bolt 0x0000000000486510 PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump:
zsh: abort (core dumped) llvm-bolt postgres -instrument -o postgres-inst |
@jcgomezv can you please add more details on the reproducer? Which postgres version you used and how was the input binary compiled exactly? I assume on the bolt side you only used |
Dear @paschalis-mpeis : The postgres version is 17.2 and compiler versions are: g++ version: aarch64-unknown-linux-gnu-g++ (GCC) 10.5.0 And that is correct I only got to the instrumentation phase as if this fails I could not collect perf data. -Wl,--emit-relocs was used in link phase I can collect full logs for the bolt run with -v if that helps, or I can recompile bolt tool with any print outs that may help diagnosing this |
BOLT-INFO: Starting pass: clean-mc-state Thread 1 "llvm-bolt" received signal SIGABRT, Aborted. |
I instrumented the bolt code to issue out the opcode and see where we break and here is what I got: AArch64MCCodeEmitter::encodeInstruction(opcode=4788 But such instruction was processes successfully before... |
And from gdb: #5 0x0000000000a55858 in (anonymous namespace)::AArch64MCCodeEmitter::getBinaryCodeForInstr (this=0x23648d90, |
(gdb) p MI |
(gdb) p *this |
(gdb) p Fixups |
I changed the code in Bolt to display more about what is the failed instruction by using MInstrInfo and got this: AArch64MCCodeEmitter::encodeInstruction(opcode=7311) The instruction is expected to have the following format but some how my Imm value is missing as we are parsing the instruction STPXPRE Xt1, Xt2, [Xn, #imm] |
More details for this encoding where we fail, I see five operands but this instruction takes 4: (gdb) p MI.Operands.size() |
@jcgomezv could you try llvm-bolt with/without "-instrument", check dissasembly and see if errors occur. |
BOLT-INFO: setting size of function modf@PLT to 16 (was 0) Binary Function "brin_bloom_summary_in" after disassembly { |
It crashes at the same place and it tell me what the instruction and it breaks right when it is missing the immediate arg: BOLT-WARNING: writable reference into the middle of the function ExecInterpExpr/1(*2) detected at address 0x35a3678 |
More info on the specific place where we crash:
Note the 4th operand seems corrupt |
Yes, it looks like it broke on "00000000: stp x29, x30, [sp, #smth corrupt]" in brin_bloom_summary_in function. As workaround you can try to skip it with |
Hi Pavel: Thanks for the suggestion let me try this and I will report back. |
That did the trick, thanks Pavel. I was able to generate an instrumented binary with the info you provided. Hopefully the debug info here will help you guys solve the bug! |
So seems like the instrumented file I get after this fails to start breaks in instruction LDR X1, [X1, #3688]. Any idea of what else we can try here? |
The actual crash in the program is within the atexit() system call |
Hi @jcgomezv, Thanks so much for your report and for providing detailed information. Could you please update the original issue with info about your platform or distro? It's difficult to reproduce the issue without that context. It is also helps future triage. Also, it would be really helpful to know how you’re obtaining your PostgreSQL binary. Providing enough details for someone to reproduce the setup would be ideal. You said "Interesting facts are that a similar version of the binary can be instrumented correctly with no crash" - what is different about this similar binary? Thanks again for your help! |
Are you building with -fPIE? |
Also the only way to generate code with that step for -O2 or -O3 compile is to use the work around provided earlier ( --skip-funcs=brin_bloom_summary_in). The binary in this case usually dies at the start while executing calls to atexit() and when we look at the assembly it seems like it is trying to access out of range memory (this code is very similar to opensource): 0 0x00000000047863e4 in atexit () at ../../../../src/include/port/atomics.h:440 . Also during optmization I see a lot of these messages: BOLT-WARNING: empty location list detected at 0x3f5d510 for DIE at 0x0xcc1ca2a0 in CU at 0xa950413 |
Let me check, we do have a large number of flags |
I confirmed we do not use -fPIE |
Some ideas:
|
@peterwaller-arm let me work through these suggestions/questions: 1.-Our Postgres binary is built statically but it does uses some dynamic libraries. A few data points that I I am not sure I mentioned before if I compile without optimizations I can instrument the binary and collect perf data via the binary: I can generate an optimized binary given that data but the binary will still not work. So seems like some gcc optmization is breaking bolt, but still the generated binary will not work (I have to dig deeper on the break here, but it is not bad atexit as before it is something else) Ofcourse being able to optimize a non optimized version it is a no-go, for me if I remove O2 my perf some some workload falls about 40%...:-( |
@peterwaller-arm We already use -fPIC in the code: is this significantly different from -fPIE? |
@jcgomezv -fPIC is even stronger. But seems your binary is not fully relocatable, you have .fini_array non-relocatable and "ELF 64-bit LSB executable" on file command. For relocatable executable IIRC you should get "LSB shared object" there |
@jcgomezv Can always verify whether your binary is a pie using something like: file BINARY | grep pie For binaries as in your case, you should get:
|
Here is the full section dumps for all the execs I generated (postgress is relocatable exec that works fine, inst-clear is bolt instrumented binary which runs fine, the other two are optimized either with perf data or bolt data they do not run crash just after start) postgres-inst-clear.txt |
The strange thing is that the binaries that work still are ELF and have similar rela sections: here is compare of opensource postgres vs our version: |
@peterwaller-arm I did experiment with -fPIE but outcome is the same: I have to give the --skip-funcs=brin_bloom_summary_in to bypass the bug described above and the binary generated still crashes at atexit. I have collected the info for the crashing and working version as requested and I can tell that the relocation part was done incorrectly: |
Now I get: Ok I was able to figure out why I did not see sharable object: although compile was being done with -fPIE linker was not using -pie. Once I added this to the linking the object shows sharable object but still bolt breaks in both modes instrumentation and optimization. And now I am not even able to generate a instrumented binary as I get the following error: BOLT-INFO: setting _end to 0x705387c If I just try --skip-funct=ExecInterpExpr it still breaks with the same message Any ideas? |
Oh never mind the question about skip-funct I see the code uses reg expressions I will try that.... |
@jcgomezv can you try to skip with |
Oh yes, this other bug sounds like that...let me push forward hopefully binaries that I generate now work :-) |
Just tried the generated binaries they both crash at the same point as before atexit() :-( if I skip atexit I still break elsewhere, have to investigate that new failure... |
If you apply BOLT patch #120267 ( hopefully we get it merged soon ) you won’t need to skip |
O interestingly enough if I skip function all the functions on that other issue and at exit, then I get a binary that seems to work. And yes I noticed that that function above had a goto inside...let me pray see if I can opmtize binary :-) |
I used this to get a working instrumented version:llvm-bolt postgres -update-debug-sections --skip-funcs=brin_bloom_summary_in -instrument -o postgres-inst-clear --skip-funcs="ExecInterpExpr/1,__do_global_dtors_aux/1,init_have_lse_atomics/1,atexit/1" |
I recognize that we need more self-diagnostics and correctness checks within BOLT to enhance the user experience. This issue could serve as a valuable example for such work, thank you for detailed reports! |
I do appreciate all your help: I am eager to get this working because I see what we can gain from the tests with opensource postgres, I think we are one step away from a good optimized binary 🙏 |
I just wanted to update this issue statin that I was able to optimize our binary by following the workarounds offered earlier, i.e. bypassing the processing of a number of functions that bolt optimizer does not support very well in ARM and also using the PIE compiling and not PIC as I was using earlier. The gains so far are very small and depend heavily on perf collection. I.e. if I collect perf for a single threaded client test then the perf gains are better but if the load is more noisy coming from multiple clients then the gains are less. If perf is collected under large number of clients then the perf information does not help optmize and at times results in a slower binary :-( |
Hey @jcgomezv, A couple of weeks ago you mentioned that you got instrumentation working. Were the numbers better? Running |
@paschalis-mpeis yes, I believe instrumented version is better than using perf collection but yet I am puzzled about how the instrumented version works, here is what I see and may be you can help me improve the data collection using the instrumented version: I noticed that the /tmp/prof.fdata gets created early in the process and then it does not change much during the testing and in our case the test workload last long after the start of the process. Is there as way to trigger collection on the instrumented binary from a starting point where the load has already 'warmed the process'? |
About the instrumentation issue, since postgres is a service you could use:
To avoid including the warm-up process in instrumentation, you can: A) Capture a perf profile that excludes warm-up code: There are a few ways to do this. You can use B) Use that profile for selective instrumentation: Once you have the profile, you can combine it with I see that you’ve closed this issue, as it’s now more of a discussion around profiling. Feel free to follow up here or in the #bolt Discord channel. |
@paschalis-mpeis are the following flags passed to the instrumented binary or during the instrumentation via bolt? --instrumentation-no-counters-clear --instrumentation-sleep-time= |
@jcgomezv these are flags passed during instrumentation via llvm-bolt ... --instrument \
--instrumentation-no-counters-clear \
--instrumentation-sleep-time=N \
-o postgres.instrumented This flag may also be relevant Some useful entries from the manual:
|
@paschalis-mpeis thanks a lot, yes found them in the instrumentation part of the manual. I also saw the section about the merge command: I need to merge perf for various loads to optimize for all of them, else I guess I am targeting one type of load...I will try these see if things improve |
BOLT-INFO: first alloc address is 0x400000
BOLT-INFO: creating new program header table at address 0x4600000, offset 0x4200000
BOLT-WARNING: debug info will be stripped from the binary. Use -update-debug-sections to keep it.
BOLT-INFO: enabling relocation mode
BOLT-INFO: forcing -jump-tables=move for instrumentation
BOLT-WARNING: 14 collisions detected while hashing binary objects. Use -v=1 to see the list.
BOLT-INFO: number of removed linker-inserted veneers: 0
BOLT-INFO: 0 out of 40985 functions in the binary (0.0%) have non-empty execution profile
BOLT-INSTRUMENTER: Number of indirect call site descriptors: 10117
BOLT-INSTRUMENTER: Number of indirect call target descriptors: 40048
BOLT-INSTRUMENTER: Number of function descriptors: 39960
BOLT-INSTRUMENTER: Number of branch counters: 336327
BOLT-INSTRUMENTER: Number of ST leaf node counters: 247871
BOLT-INSTRUMENTER: Number of direct call counters: 66698
BOLT-INSTRUMENTER: Total number of counters: 650896
BOLT-INSTRUMENTER: Total size of counters: 5207168 bytes (static alloc memory)
BOLT-INSTRUMENTER: Total size of string table emitted: 2310032 bytes in file
BOLT-INSTRUMENTER: Total size of descriptors: 37711472 bytes in file
BOLT-INSTRUMENTER: Profile will be saved to file /tmp/prof.fdata
BOLT-INFO: removed 4975 empty blocks
BOLT-INFO: UCE removed 62903 blocks and 5076444 bytes of code
BOLT-INFO: Starting stub-insertion pass
BOLT-INFO: Inserted 0 stubs in the hot area and 0 stubs in the cold area. Shared 0 times, iterated 1 times.
llvm-bolt: /home/gomezjc/OpenSource/llvm-project/llvm/lib/Target/AArch64/MCTargetDesc/AArch64MCCodeEmitter.cpp:235: unsigned int {anonymous}::AArch64MCCodeEmitter::getMachineOpValue(const llvm::MCInst&, const llvm::MCOperand&, llvm::SmallVectorImplllvm::MCFixup&, const llvm::MCSubtargetInfo&) const: Assertion `MO.isImm() && "did not expect relocated expression"' failed.
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var
LLVM_SYMBOLIZER_PATH
to point to it):0 llvm-bolt 0x0000000000e58b10
1 llvm-bolt 0x0000000000e56560
2 linux-vdso.so.1 0x0000ffff8fbb9834 __kernel_rt_sigreturn + 0
3 libc.so.6 0x0000ffff8f6d8834 gsignal + 180
4 libc.so.6 0x0000ffff8f6da140 abort + 352
5 libc.so.6 0x0000ffff8f6d1780
6 libc.so.6 0x0000ffff8f6d17fc
7 llvm-bolt 0x0000000000835990
8 llvm-bolt 0x00000000008373e4
9 llvm-bolt 0x0000000000ca178c
10 llvm-bolt 0x000000000160e968
11 llvm-bolt 0x0000000001610680
12 llvm-bolt 0x0000000001610ed8
13 llvm-bolt 0x0000000000f35bb8
14 llvm-bolt 0x0000000000f38a18
15 llvm-bolt 0x000000000040b428
16 libc.so.6 0x0000ffff8f6c5da4 __libc_start_main + 228
17 llvm-bolt 0x0000000000486510
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
zsh: abort (core dumped) llvm-bolt postgres -instrument -o postgres-inst
The text was updated successfully, but these errors were encountered: