Skip to content

Commit d81e38f

Browse files
committed
call_graph.py without filtering done
1 parent 777b742 commit d81e38f

18 files changed

+218
-28
lines changed

c/hello_world.s

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
.file "hello_world.c"
2+
.section .rodata
3+
.LC0:
4+
.string "hello world"
5+
.text
6+
.globl main
7+
.type main, @function
8+
main:
9+
.LFB2:
10+
.cfi_startproc
11+
pushq %rbp
12+
.cfi_def_cfa_offset 16
13+
.cfi_offset 6, -16
14+
movq %rsp, %rbp
15+
.cfi_def_cfa_register 6
16+
movl $.LC0, %edi
17+
call puts
18+
movl $0, %eax
19+
popq %rbp
20+
.cfi_def_cfa 7, 8
21+
ret
22+
.cfi_endproc
23+
.LFE2:
24+
.size main, .-main
25+
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4"
26+
.section .note.GNU-stack,"",@progbits

c/implementations.md

+14
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,20 @@ FreeBSD moved to it in 2012: <http://unix.stackexchange.com/questions/49906/why-
1616

1717
Sony PS4 (2013 Q4, FreeBSD based) moved to it while PS3 used GCC.
1818

19+
The great advantages of this are:
20+
21+
- better tooling than GNU: Python for scripting, CMake and Doxygen doc instead of `.exp`, Autotools and Texinfo
22+
- clearer design, since it was made 20 years later and it used what was learned
23+
- can be used as a library. This is a classic problem of bug CLI utilities: they were not designed to be used programmatically as a library, and it is hard to modify them to do so. GCC is pushing forward in that direction as well now.
24+
25+
### Small Device C Compiler
26+
27+
<https://en.wikipedia.org/wiki/Small_Device_C_Compiler>
28+
29+
Targets mostly microcontrollers, which GCC does not target as well: <http://sourceforge.net/p/sdcc/mailman/message/31601719/>
30+
31+
GPL.
32+
1933
### CompCert C compiler
2034

2135
<http://compcert.inria.fr/compcert-C.html>

cflow.md

+2
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@ Make static call graphs.
44

55
GNU.
66

7+
Not perfect nor ultra-powerful, but comes in handy when reading code.
8+
79
<https://en.wikipedia.org/wiki/GNU_cflow>
810

911
<http://www.gnu.org/software/cflow/>

gcc/source-tree.md

+4
Original file line numberDiff line numberDiff line change
@@ -207,3 +207,7 @@ Virtual table verification:
207207
- <https://sunglint.wordpress.com/2013/06/13/vtv/>
208208

209209
Security feature.
210+
211+
## Vocabulary
212+
213+
- leaf function: function that does not call any other. Optimizations are possible for those functions, like not decrementing `rsp`: http://stackoverflow.com/questions/13201644/why-does-the-x86-64-gcc-function-prologue-allocate-less-stack-than-the-local-var and in x86-64 not saving function parameters passed as registers to stack.

gdb/README.md

+4-1
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
GNU debugger.
44

5+
Tested on 7.7.1 unless mentioned otherwise.
6+
57
1. [Getting started](getting-started.md)
68
1. [Introduction](introduction.md)
79
1. [Invocation](invocation.md)
@@ -30,7 +32,8 @@ GNU debugger.
3032
1. [read_var()](read_var.py)
3133
1. [Command](command)
3234
1. [argv-wrapper](argv-wrapper)
33-
1. [Call graph](call_graph_rbreak.py)
35+
1. [Call graph](call_graph.py)
36+
1. [Call graph rbreak](call_graph_rbreak.py)
3437
1. [Continue until instruction](continue_instruction.py)
3538
1. [Continue until return](continue_return.py)
3639
1. [Break on return](break_return.py)

gdb/break_return.py

+1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ def invoke(self, arg, from_tty):
2020
while block:
2121
if block.function:
2222
break
23+
block = block.superblock
2324
start = block.start
2425
end = block.end
2526
arch = frame.architecture()

gdb/breakpoint.py

+29
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
"""
2+
## Brekapoint
3+
4+
Create breakpoints from the python API.
5+
6+
TODO: create a breakpoint with the python API without ouputting anyting to stdout?
7+
"""
8+
9+
gdb.execute('file big_function.out', to_string=True)
10+
11+
# Noisy.
12+
gdb.Breakpoint('main')
13+
gdb.Breakpoint('main', internal=True)
14+
15+
# Noisy.
16+
class Breakpoint0(gdb.Breakpoint):
17+
def stop (self):
18+
"""
19+
Action to take when the breakpoint is reached.
20+
"""
21+
gdb.write('0\n')
22+
# Continue automatically.
23+
return False
24+
# Actually stop.
25+
return True
26+
Breakpoint0('main')
27+
28+
gdb.execute('run')
29+
gdb.execute('continue')

gdb/call_graph.py

+22-14
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,11 @@
1212
"""
1313

1414
gdb.execute('file call_graph_py.out', to_string=True)
15-
# rbreak before start to ignore dynamically linked stdlib functions.
16-
gdb.execute('set confirm off')
1715
gdb.execute('start', to_string=True)
1816
depth_string = 4 * ' '
1917
thread = gdb.inferiors()[0].threads()[0]
20-
while thread.is_valid():
18+
disassembled_functions = set()
19+
while True:
2120
frame = gdb.selected_frame()
2221
symtab = frame.find_sal().symtab
2322

@@ -30,7 +29,6 @@
3029
# Not present for files without debug symbols.
3130
source_path = '???'
3231
if symtab:
33-
#source_path = symtab.fullname()
3432
source_path = symtab.filename
3533

3634
# Not present for files without debug symbols.
@@ -46,22 +44,32 @@
4644
if symbol.is_argument:
4745
args += '{} = {}, '.format(symbol.name, symbol.value(frame))
4846

49-
# Mark new breakpoints.
50-
while block:
51-
if block.function:
52-
break
47+
# Put a breakpoint on the address of every funtion called from this function.
48+
# Only do that the first time we enter a function (TODO implement.)
5349
start = block.start
54-
end = block.end
55-
arch = frame.architecture()
56-
pc = gdb.selected_frame().pc()
57-
instructions = arch.disassemble(start, end - 1)
58-
for instruction in instructions:
59-
print('{:x} {}'.format(instruction['addr'], instruction['asm']))
50+
if not start in disassembled_functions:
51+
disassembled_functions.add(start)
52+
end = block.end
53+
arch = frame.architecture()
54+
pc = gdb.selected_frame().pc()
55+
instructions = arch.disassemble(start, end - 1)
56+
for instruction in instructions:
57+
# This is UGLY. I wish there was a disassembly Python interface to GDB,
58+
# like https://github.com/aquynh/capstone which allows me to extract
59+
# the opcode without parsing.
60+
if instruction['asm'].split()[0] == 'callq':
61+
gdb.Breakpoint('*{}'.format(instruction['addr']), internal=True)
6062

6163
print('{}{} : {} : {}'.format(
6264
stack_depth * depth_string,
6365
source_path,
6466
frame.name(),
6567
args
6668
))
69+
# We are at the call instruction.
6770
gdb.execute('continue', to_string=True)
71+
if thread.is_valid():
72+
# We are at the first instruction of the called function.
73+
gdb.execute('stepi', to_string=True)
74+
else:
75+
break

gdb/call_graph_rbreak.py

+18-5
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,27 @@
77
88
- http://stackoverflow.com/questions/9549693/gdb-list-of-all-function-calls-made-in-an-application
99
- http://stackoverflow.com/questions/311948/make-gdb-print-control-flow-of-functions-as-they-are-called
10+
11+
What happens on the output:
12+
13+
- First line is:
14+
15+
../csu/init-first.c : _init : argc = 1, argv = 0x7fffffffd728, envp = 0x7fffffffd738
16+
17+
It actually does get called before the `_start`:
18+
http://stackoverflow.com/questions/31379422/why-is-init-from-glibcs-csu-init-first-c-called-before-start-even-if-start-i
19+
20+
I think it has a level deeper than one because the callees are not broke on:
21+
they are not part of the executable.
1022
"""
1123

12-
gdb.execute('file cc1', to_string=True)
13-
gdb.execute('rbreak .', to_string=True)
14-
gdb.execute('set args hello_world.c', to_string=True)
15-
# rbreak before start to ignore dynamically linked stdlib functions.
24+
gdb.execute('file ./call_graph_py.out', to_string=True)
1625
gdb.execute('set confirm off')
17-
gdb.execute('start', to_string=True)
26+
# rbreak before run to ignore dynamically linked stdlib functions which take too long.
27+
# If we do it before, we would also go into stdlib functions, which is often what we don't want,
28+
# since we already understand them.
29+
gdb.execute('rbreak .', to_string=True)
30+
gdb.execute('run', to_string=True)
1831
depth_string = 4 * ' '
1932
thread = gdb.inferiors()[0].threads()[0]
2033
while thread.is_valid():

gdb/commands.md

+37
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,8 @@ The program will run until it reaches:
8787

8888
Like run, but also add a temporary (deleted once hit) breakpoint at `main` and stop there.
8989

90+
`main` this is not the actual executable entry point, see also: <http://stackoverflow.com/questions/10483544/stopping-at-the-first-machine-code-instruction-in-gdb>
91+
9092
### q
9193

9294
### quit
@@ -268,6 +270,32 @@ Set breakpoint at address:
268270

269271
b *0x400440
270272

273+
If you set a breakpoint in the middle of an instruction, segfault. E.g., supposing `0x400440` is 2 bytes long, then:
274+
275+
b *0x400441
276+
c
277+
c
278+
279+
may segfault. TODO why? Linked to hardware breakpoints I imagine.
280+
281+
`break function` does not break on the very first instruction of the function, but rather further after the prologue: <http://stackoverflow.com/questions/25545994/how-does-gdb-determine-function-entry-points> GDB guesses prologues even without debug information. TODO can debug information contain prologue information?
282+
283+
#### Dynamically loaded library breakpoints
284+
285+
When you do:
286+
287+
break printf
288+
289+
before running the program, what happens depends on the value of:
290+
291+
show breakpoint pending
292+
293+
which can be set with:
294+
295+
set breakpoint pending on|off|auto
296+
297+
The default is auto, which is not `auto`, but rather asks for confirmation...
298+
271299
### watch
272300

273301
### rwatch
@@ -304,6 +332,8 @@ Break at all function calls:
304332

305333
If you do this after running `start`, it will add hundreds of breaks and take a noticeable amount (e.g. 30 seconds) of time for a hello world because of the dynamic library functions. If done before `start`, those will not be seen.
306334

335+
This will also pick up the execution of the loader which the kernel points to before `_start`: <http://stackoverflow.com/questions/31379422/why-is-init-from-glibcs-csu-init-first-c-called-before-start-even-if-start-i/31387656#31387656>
336+
307337
### dis
308338

309339
### disable
@@ -742,6 +772,13 @@ Current line every time the debugger stops:
742772

743773
set disassemble-next-line on
744774

775+
Without debug information in an unstripped ELF:
776+
777+
- uses the `st_size` of the symbol to dissemble to determine where to stop
778+
- without arguments, finds the current function by searching for the closest symbol that contains `$pc` and has `st_size` set
779+
780+
Being a `STT_FUNC` does not seem required.
781+
745782
### x
746783

747784
Examine 4 bytes of memory:

gdb/disas.py

+15-3
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,23 @@
11
"""
22
## disas
33
4-
Python implementation of disas
4+
Python implementation of the `disas` command.
55
"""
66

77
def disas():
88
frame = gdb.selected_frame()
9+
# TODO: make this work for files without debugging info.
10+
# block() is only available with debug information.
11+
#
12+
# What we need is to use symbol information instead.
13+
# ELF symboltable symbols have the initial address
14+
# and the size attributes, so that should be enough.
915
block = frame.block()
16+
# Find the current function if in an inner block.
1017
while block:
1118
if block.function:
1219
break
20+
block = block.superblock
1321
start = block.start
1422
end = block.end
1523
arch = frame.architecture()
@@ -18,8 +26,12 @@ def disas():
1826
for instruction in instructions:
1927
print('{:x} {}'.format(instruction['addr'], instruction['asm']))
2028

21-
gdb.execute('file if_else.out', to_string=True)
29+
gdb.execute('file disas_py.out', to_string=True)
2230
gdb.execute('start', to_string=True)
2331
disas()
32+
33+
# Test from the inner block.
34+
gdb.execute('break 9', to_string=True)
35+
gdb.execute('continue', to_string=True)
2436
print()
25-
gdb.execute('disas')
37+
disas()

gdb/disas_py.c

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
#include <stdio.h>
2+
#include <time.h>
3+
4+
int main() {
5+
int i;
6+
i = time(NULL);
7+
/* This should generate an inner block. */
8+
{
9+
int i;
10+
i = time(NULL);
11+
printf("%d\n", i);
12+
}
13+
printf("%d\n", i);
14+
return 0;
15+
}

gdb/hello_world.c

+7
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#include <stdio.h>
2+
#include <stdlib.h>
3+
4+
int main() {
5+
puts("hello world");
6+
return EXIT_SUCCESS;
7+
}

gdb/internals.md

+4
Original file line numberDiff line numberDiff line change
@@ -10,3 +10,7 @@ Hardware vs software breakpoints: <http://stackoverflow.com/questions/8878716/wh
1010
## Vocabulary
1111

1212
- `sal`: symtable and line. Both are often passed around together. TODO why.
13+
14+
## Source tree
15+
16+
- `cli/cli-cmds.c`: defines the built-in commands. Good place to start probing. Not *all* commands are there however. Grep for strings, e.g. `"break"`.

gdb/python-interface.md

+6-2
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,10 @@ Exit with Ctrl + D.
6969

7070
## Bibliography
7171

72-
2008 tutorial: <https://sourceware.org/gdb/wiki/PythonGdbTutorial>
72+
- `help gdb` from the python shell. Great way to see what the API contains. Then read the `.texi` docs for the details.
7373

74-
Library that uses it a lot: <https://github.com/longld/peda/>
74+
- <https://sourceware.org/gdb/wiki/PythonGdbTutorial> 2008 tutorial by the API author. Somewhat outdated.
75+
76+
- <https://github.com/longld/peda/> famous library that uses it a lot
77+
78+
- <https://github.com/stephenbradshaw/pygdbdis/blob/master/pygdbdis.py> small library that also uses it

glibc/crt1.md

+10
Original file line numberDiff line numberDiff line change
@@ -70,3 +70,13 @@ Come from `libgcc/crtstuff.c`.
7070
## Scrt1
7171

7272
<http://stackoverflow.com/questions/16436035/whats-the-usage-of-mcrt1-o-and-scrt1-o>
73+
74+
## init-first
75+
76+
TODO. This actually seems to be the actual initial code that is run, *not* `_start`, even though `_start` is set at the ELF header as the entry point. What is going on?
77+
78+
Just try to do `b _init` on `gdb`.
79+
80+
## Bibliography
81+
82+
- <http://dbp-consulting.com/tutorials/debugging/linuxProgramStartup.html>

0 commit comments

Comments
 (0)