-
-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathelf-hello-world.bigb
1179 lines (929 loc) · 48.6 KB
/
elf-hello-world.bigb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
= ELF Hello World Tutorial
{c}
{numbered}
{scope}
{tag=articles}
Introductory analysis of a simple example of the <executable and Linkable Format>.
Extracted from https://stackoverflow.com/a/30648229/895245[this Stack Overflow answer].
= Introduction
{parent=elf hello world}
ELF is the dominating file format for Linux. It competes with Mach-O for OS X and PE for Windows.
ELF supersedes `.coff`, which supersedes `a.out`.
= Standards
{parent=Introduction}
ELF is specified by the https://en.wikipedia.org/wiki/Linux_Standard_Base[LSB]:
* core generic: https://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-generic/LSB-Core-generic/elf-generic.html
* core AMD64: https://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-AMD64/LSB-Core-AMD64/book1.html
The LSB basically links to other standards with minor extensions, in particular:
* Generic (both by https://en.wikipedia.org/wiki/Santa_Cruz_Operation[SCO]):
* System V ABI 4.1 (1997) http://www.sco.com/developers/devspecs/gabi41.pdf, no 64 bit, although a magic number is reserved for it. Same for core files. \i[This] is the first document you should look at when searching for information.
* System V ABI Update DRAFT 17 (2003) http://www.sco.com/developers/gabi/2003-12-17/contents.html, adds 64 bit. Only updates chapters 4 and 5 of the previous document: the others remain valid and are still referenced.
* Architecture specific (by the processor vendor):
* IA-32: https://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-IA32/LSB-Core-IA32/elf-ia32.html, points mostly to http://www.sco.com/developers/devspecs/abi386-4.pdf
* AMD64: https://refspecs.linuxfoundation.org/LSB_4.1.0/LSB-Core-AMD64/LSB-Core-AMD64/elf-amd64.html, points mostly to http://www.x86-64.org/documentation/abi.pdf
A handy summary can be found at:
``
man elf
``
= How to learn
{parent=Introduction}
Spin like mad between:
* standards
* high level generators. We use the <assembler (computing)> `as` and <linker (computing)> `ld`.
* hexdumps
* file decompilers. We use `readelf`. It makes it faster to read the ELF file by turning it into human readable output. But you must have seen one byte-by-byte example first, and think how `readelf` output maps to the standard.
* low-level generators: stand-alone libraries that let you control every field of the ELF files you generated. https://github.com/BR903/ELFkickers, https://github.com/sqall01/ZwoELF and many more on GitHub.
* consumer: the `exec` system call of the Linux kernel can parse ELF files to starts processes: https://github.com/torvalds/linux/blob/v4.11/fs/binfmt_elf.c, https://stackoverflow.com/questions/8352535/how-does-kernel-get-an-executable-binary-file-running-under-linux/31394861#31394861
= Specified file formats
{parent=Introduction}
The ELF standard specifies multiple file formats:
* Object files (`.o`).
Intermediate step to generating executables and other formats:
``
Source code
|
| Compilation
|
v
Object file
|
| Linking
|
v
Executable
``
Object files exist to make compilation faster: with `make`, we only have to recompile the modified source files based on timestamps.
We have to do the linking step every time, but it is much less expensive.
* Executable files (no standard Linux extension).
This is what the Linux kernel can actually run.
* Archive files (`.a`).
Libraries meant to be embedded into executables during the Linking step.
* Shared object files (`.so`).
Libraries meant to be loaded when the executable starts running.
* Core dumps.
Such files may be generated by the Linux kernel when the program does naughty things, e.g. segfault.
They exist to help debugging the program.
In this tutorial, we consider only object and executable files.
= Implementations
{parent=Introduction}
* Compiler toolchains generate and read ELF files.
Sane compilers should use a separate standalone library to do the dirty work. E.g., Binutils uses BFD (in-tree and canonical source).
* Operating systems read and run ELF files.
Kernels cannot link to a library nor use the C stlib, so they are more likely to implement it themselves.
This is the case of the Linux kernel 4.2 which implements it in th file `fs/binfmt_elf.c`.
* Specialized libraries. Examples:
* https://github.com/eliben/pyelftools. By a hardcore Googler: https://plus.google.com/+EliBenderskyGplus/posts
* https://sourceforge.net/projects/elftoolchain
= Minimal ELF file
{parent=elf hello world}
It is non-trivial to determine what is the smallest legal ELF file, or the smaller one that will do something trivial in Linux.
Some impressive attempts:
* https://codegolf.stackexchange.com/questions/5696/shortest-elf-for-hello-world-n
* https://www.muppetlabs.com/~breadbox/software/tiny/
* http://timelessname.com/elfbin/
In this example we will consider a saner `hello world` example that will better capture real life cases.
= Generate the example
{parent=elf hello world}
Let's break down a minimal runnable Linux x86-64 example:
hello_world.asm
``
section .data
hello_world db "Hello world!", 10
hello_world_len equ $ - hello_world
section .text
global _start
_start:
mov rax, 1
mov rdi, 1
mov rsi, hello_world
mov rdx, hello_world_len
syscall
mov rax, 60
mov rdi, 0
syscall
``
Compiled with:
``
nasm -w+all -f elf64 -o 'hello_world.o' 'hello_world.asm'
ld -o 'hello_world.out' 'hello_world.o'
``
TODO: use a minimal linker script with `-T` to be more precise and minimal.
Versions:
* NASM 2.10.09
* Binutils version 2.24 (contains `ld`)
* Ubuntu 14.04
We don't use a C program as that would complicate the analysis, that will be level 2 :-)
= Object hd
{parent=elf hello world}
Running:
``
hd hello_world.o
``
gives:
``
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
00000020 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000030 00 00 00 00 40 00 00 00 00 00 40 00 07 00 03 00 |....@.....@.....|
00000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000080 01 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 |................|
000000a0 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000c0 07 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00 |................|
000000d0 00 00 00 00 00 00 00 00 10 02 00 00 00 00 00 00 |................|
000000e0 27 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |'...............|
000000f0 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000100 0d 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
00000110 00 00 00 00 00 00 00 00 40 02 00 00 00 00 00 00 |........@.......|
00000120 32 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |2...............|
00000130 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000140 17 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |................|
00000150 00 00 00 00 00 00 00 00 80 02 00 00 00 00 00 00 |................|
00000160 a8 00 00 00 00 00 00 00 05 00 00 00 06 00 00 00 |................|
00000170 04 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 |................|
00000180 1f 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 00 00 00 00 00 00 00 00 30 03 00 00 00 00 00 00 |........0.......|
000001a0 34 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |4...............|
000001b0 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001c0 27 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 |'...............|
000001d0 00 00 00 00 00 00 00 00 70 03 00 00 00 00 00 00 |........p.......|
000001e0 18 00 00 00 00 00 00 00 04 00 00 00 02 00 00 00 |................|
000001f0 04 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 |................|
00000200 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 0a 00 00 00 |Hello world!....|
00000210 b8 01 00 00 00 bf 01 00 00 00 48 be 00 00 00 00 |..........H.....|
00000220 00 00 00 00 ba 0d 00 00 00 0f 05 b8 3c 00 00 00 |............<...|
00000230 bf 00 00 00 00 0f 05 00 00 00 00 00 00 00 00 00 |................|
00000240 00 2e 64 61 74 61 00 2e 74 65 78 74 00 2e 73 68 |..data..text..sh|
00000250 73 74 72 74 61 62 00 2e 73 79 6d 74 61 62 00 2e |strtab..symtab..|
00000260 73 74 72 74 61 62 00 2e 72 65 6c 61 2e 74 65 78 |strtab..rela.tex|
00000270 74 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |t...............|
00000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000290 00 00 00 00 00 00 00 00 01 00 00 00 04 00 f1 ff |................|
000002a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000002b0 00 00 00 00 03 00 01 00 00 00 00 00 00 00 00 00 |................|
000002c0 00 00 00 00 00 00 00 00 00 00 00 00 03 00 02 00 |................|
000002d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000002e0 11 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 |................|
000002f0 00 00 00 00 00 00 00 00 1d 00 00 00 00 00 f1 ff |................|
00000300 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000310 2d 00 00 00 10 00 02 00 00 00 00 00 00 00 00 00 |-...............|
00000320 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000330 00 68 65 6c 6c 6f 5f 77 6f 72 6c 64 2e 61 73 6d |.hello_world.asm|
00000340 00 68 65 6c 6c 6f 5f 77 6f 72 6c 64 00 68 65 6c |.hello_world.hel|
00000350 6c 6f 5f 77 6f 72 6c 64 5f 6c 65 6e 00 5f 73 74 |lo_world_len._st|
00000360 61 72 74 00 00 00 00 00 00 00 00 00 00 00 00 00 |art.............|
00000370 0c 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................|
00000380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000390
``
= Executable hd
{parent=elf hello world}
Running:
``
hd hello_world.out
``
gives:
``
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00 |..>.......@.....|
00000020 40 00 00 00 00 00 00 00 10 01 00 00 00 00 00 00 |@...............|
00000030 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00 |[email protected]...@.....|
00000040 01 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 |................|
00000050 00 00 40 00 00 00 00 00 00 00 40 00 00 00 00 00 |..@.......@.....|
00000060 d7 00 00 00 00 00 00 00 d7 00 00 00 00 00 00 00 |................|
00000070 00 00 20 00 00 00 00 00 01 00 00 00 06 00 00 00 |.. .............|
00000080 d8 00 00 00 00 00 00 00 d8 00 60 00 00 00 00 00 |..........`.....|
00000090 d8 00 60 00 00 00 00 00 0d 00 00 00 00 00 00 00 |..`.............|
000000a0 0d 00 00 00 00 00 00 00 00 00 20 00 00 00 00 00 |.......... .....|
000000b0 b8 01 00 00 00 bf 01 00 00 00 48 be d8 00 60 00 |..........H...`.|
000000c0 00 00 00 00 ba 0d 00 00 00 0f 05 b8 3c 00 00 00 |............<...|
000000d0 bf 00 00 00 00 0f 05 00 48 65 6c 6c 6f 20 77 6f |........Hello wo|
000000e0 72 6c 64 21 0a 00 2e 73 79 6d 74 61 62 00 2e 73 |rld!...symtab..s|
000000f0 74 72 74 61 62 00 2e 73 68 73 74 72 74 61 62 00 |trtab..shstrtab.|
00000100 2e 74 65 78 74 00 2e 64 61 74 61 00 00 00 00 00 |.text..data.....|
00000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000150 1b 00 00 00 01 00 00 00 06 00 00 00 00 00 00 00 |................|
00000160 b0 00 40 00 00 00 00 00 b0 00 00 00 00 00 00 00 |..@.............|
00000170 27 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |'...............|
00000180 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000190 21 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 |!...............|
000001a0 d8 00 60 00 00 00 00 00 d8 00 00 00 00 00 00 00 |..`.............|
000001b0 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001c0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000001d0 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
000001e0 00 00 00 00 00 00 00 00 e5 00 00 00 00 00 00 00 |................|
000001f0 27 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |'...............|
00000200 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000210 01 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 |................|
00000220 00 00 00 00 00 00 00 00 90 02 00 00 00 00 00 00 |................|
00000230 08 01 00 00 00 00 00 00 05 00 00 00 07 00 00 00 |................|
00000240 08 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 |................|
00000250 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 |................|
00000260 00 00 00 00 00 00 00 00 98 03 00 00 00 00 00 00 |................|
00000270 4c 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |L...............|
00000280 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000002a0 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00 |................|
000002b0 b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 |..@.............|
000002c0 00 00 00 00 03 00 02 00 d8 00 60 00 00 00 00 00 |..........`.....|
000002d0 00 00 00 00 00 00 00 00 01 00 00 00 04 00 f1 ff |................|
000002e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000002f0 11 00 00 00 00 00 02 00 d8 00 60 00 00 00 00 00 |..........`.....|
00000300 00 00 00 00 00 00 00 00 1d 00 00 00 00 00 f1 ff |................|
00000310 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000320 00 00 00 00 04 00 f1 ff 00 00 00 00 00 00 00 00 |................|
00000330 00 00 00 00 00 00 00 00 2d 00 00 00 10 00 01 00 |........-.......|
00000340 b0 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 |..@.............|
00000350 34 00 00 00 10 00 02 00 e5 00 60 00 00 00 00 00 |4.........`.....|
00000360 00 00 00 00 00 00 00 00 40 00 00 00 10 00 02 00 |........@.......|
00000370 e5 00 60 00 00 00 00 00 00 00 00 00 00 00 00 00 |..`.............|
00000380 47 00 00 00 10 00 02 00 e8 00 60 00 00 00 00 00 |G.........`.....|
00000390 00 00 00 00 00 00 00 00 00 68 65 6c 6c 6f 5f 77 |.........hello_w|
000003a0 6f 72 6c 64 2e 61 73 6d 00 68 65 6c 6c 6f 5f 77 |orld.asm.hello_w|
000003b0 6f 72 6c 64 00 68 65 6c 6c 6f 5f 77 6f 72 6c 64 |orld.hello_world|
000003c0 5f 6c 65 6e 00 5f 73 74 61 72 74 00 5f 5f 62 73 |_len._start.__bs|
000003d0 73 5f 73 74 61 72 74 00 5f 65 64 61 74 61 00 5f |s_start._edata._|
000003e0 65 6e 64 00 |end.|
000003e4
``
= Global file structure
{parent=elf hello world}
An ELF file contains the following parts:
* ELF header. Points to the position of the section header table and the program header table.
* Section header table (optional on executable). Each has `e_shnum` section headers, each pointing to the position of a section.
* N sections, with `N <= e_shnum` (optional on executable)
* Program header table (only on executable). Each has `e_phnum` program headers, each pointing to the position of a segment.
* N segments, with `N <= e_phnum` (only on executable)
The order of those parts is \i[not] fixed: the only fixed thing is the ELF header that must be the first thing on the file: Generic docs say:
> Although the figure shows the program header table immediately after the ELF header, and the section header table following the sections, actual files may differ. Moreover, sections and segments have no specified order. Only the ELF header has a fixed position in the file.
In pictures: sample object file with three sections:
``
+-------------------+
| ELF header |---+
+---------> +-------------------+ | e_shoff
| | |<--+
| Section | Section header 0 |
| | |---+ sh_offset
| Header +-------------------+ |
| | Section header 1 |---|--+ sh_offset
| Table +-------------------+ | |
| | Section header 2 |---|--|--+
+---------> +-------------------+ | | |
| Section 0 |<--+ | |
+-------------------+ | | sh_offset
| Section 1 |<-----+ |
+-------------------+ |
| Section 2 |<--------+
+-------------------+
``
But nothing (except sanity) prevents the following topology:
``
+-------------------+
| ELF header |---+ e_shoff
+-------------------+ |
| Section 1 |<--|--+
+---------> +-------------------+ | |
| | |<--+ | sh_offset
| Section | Section header 0 | |
| | |------|---------+
| Header +-------------------+ | |
| | Section header 1 |------+ |
| Table +-------------------+ |
| | Section header 2 |---+ | sh_offset
+---------> +-------------------+ | sh_offset |
| Section 2 |<--+ |
+-------------------+ |
| Section 0 |<---------------+
+-------------------+
``
But some newbies may prefer PNGs :-)
\Image[https://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/ELF_Executable_and_Linkable_Format_diagram_by_Ange_Albertini.png/1280px-ELF_Executable_and_Linkable_Format_diagram_by_Ange_Albertini.png]
{title=ELF Executable and Linkable Format diagram by Ange Albertini}
{height=900}
{source=https://github.com/corkami/pics/blob/28cb0226093ed57b348723bc473cea0162dad366/binary/elf101/elf101.pdf}
= Section vs segment
{parent=elf hello world}
We will get into more detail later, but it is good to have it in mind now:
* section: exists before linking, in object files.
One ore more sections will be put inside a single segment by the linker.
Major information sections contain for the linker: is this section:
* raw data to be loaded into memory, e.g. `.data`, `.text`, etc.
* or metadata about other sections, that will be used by the linker, but disappear at runtime e.g. `.symtab`, `.srttab`, `.rela.text`
* segment: exists after linking, in the executable file.
Contains information about how each segment should be loaded into memory by the OS, notably location and permissions.
See also:
* https://stackoverflow.com/questions/14361248/whats-the-difference-of-section-and-segment-in-elf-file-format
* https://stackoverflow.com/questions/23379880/difference-between-program-header-and-section-header-in-elf
= ELF header
{parent=elf hello world}
Running:
``
readelf -h hello_world.o
``
outputs:
``
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: REL (Relocatable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x0
Start of program headers: 0 (bytes into file)
Start of section headers: 64 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 0 (bytes)
Number of program headers: 0
Size of section headers: 64 (bytes)
Number of section headers: 7
Section header string table index: 3
``
Running:
``
readelf -h hello_world.out
``
outputs:
``
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4000b0
Start of program headers: 64 (bytes into file)
Start of section headers: 272 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 2
Size of section headers: 64 (bytes)
Number of section headers: 6
Section header string table index: 3
``
Bytes in the object file:
``
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 01 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 |..>.............|
00000020 00 00 00 00 00 00 00 00 40 00 00 00 00 00 00 00 |........@.......|
00000030 00 00 00 00 40 00 00 00 00 00 40 00 07 00 03 00 |....@.....@.....|
``
Executable:
``
00000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 |.ELF............|
00000010 02 00 3e 00 01 00 00 00 b0 00 40 00 00 00 00 00 |..>.......@.....|
00000020 40 00 00 00 00 00 00 00 10 01 00 00 00 00 00 00 |@...............|
00000030 00 00 00 00 40 00 38 00 02 00 40 00 06 00 03 00 |[email protected]...@.....|
``
Structure represented:
``
# define EI_NIDENT 16
typedef struct {
unsigned char e_ident[EI_NIDENT];
Elf64_Half e_type;
Elf64_Half e_machine;
Elf64_Word e_version;
Elf64_Addr e_entry;
Elf64_Off e_phoff;
Elf64_Off e_shoff;
Elf64_Word e_flags;
Elf64_Half e_ehsize;
Elf64_Half e_phentsize;
Elf64_Half e_phnum;
Elf64_Half e_shentsize;
Elf64_Half e_shnum;
Elf64_Half e_shstrndx;
} Elf64_Ehdr;
``
Manual breakdown:
* 0 0: `EI_MAG` = `7f 45 4c 46` = `0x7f 'E', 'L', 'F'`: ELF magic number
* 0 4: `EI_CLASS` = `02` = `ELFCLASS64`: 64 bit elf
* 0 5: `EI_DATA` = `01` = `ELFDATA2LSB`: little endian data
* 0 6: `EI_VERSION` = `01`: format version
* 0 7: `EI_OSABI` (only in 2003 Update) = `00` = `ELFOSABI_NONE`: no extensions.
* 0 8: `EI_PAD` = 8x `00`: reserved bytes. Must be set to 0.
* 1 0: `e_type` = `01 00` = 1 (big endian) = `ET_REl`: relocatable format
On the executable it is `02 00` for `ET_EXEC`.
Another important possibility for the executable is `ET_DYN` for PIE executables and shared libraries.
`ET_DYN` tells the Linux kernel that the code is position independent, and can loaded at a random memory location with ASLR.
This is explained further at:
* https://stackoverflow.com/questions/2463150/what-is-the-fpie-option-for-position-independent-executables-in-gcc-and-ld/51308031#51308031
* https://stackoverflow.com/questions/34519521/why-does-gcc-create-a-shared-object-instead-of-an-executable-binary-according-to/55704865#55704865
* 1 2: `e_machine` = `3e 00` = `62` = `EM_X86_64`: AMD64 architecture
* 1 4: `e_version` = `01 00 00 00`: must be 1
* 1 8: `e_entry` = 8x `00`: execution address entry point, or 0 if not applicable like for the object file since there is no entry point.
On the executable, it is `b0 00 40 00 00 00 00 00`. The kernel puts the RIP directly on that value when executing. It can be configured by the linker script or `-e`. But it will segfault if you set it too low: https://stackoverflow.com/questions/2187484/why-is-the-elf-execution-entry-point-virtual-address-of-the-form-0x80xxxxx-and-n
* 2 0: `e_phoff` = 8x `00`: program header table offset, 0 if not present.
`40 00 00 00` on the executable, i.e. it starts immediately after the ELF header.
* 2 8: `e_shoff` = `40` 7x `00` = `0x40`: section header table file offset, 0 if not present.
* 3 0: `e_flags` = `00 00 00 00` Arch specific. `i386` docs say:
> The Intel386 architecture defines no flags; so this member contains zero.
* 3 4: `e_ehsize` = `40 00`: size of this elf header. TODO why this field needed? Isn't the size fixed?
* 3 6: `e_phentsize` = `00 00`: size of each program header, 0 if not present.
`38 00` on executable: it is 56 bytes long
* 3 8: `e_phnum` = `00 00`: number of program header entries, 0 if not present.
`02 00` on executable: there are 2 entries.
* 3 A: `e_shentsize` and `e_shnum` = `40 00 07 00`: section header size and number of entries
* 3 E: `e_shstrndx` (`Section Header STRing iNDeX`) = `03 00`: index of the `.shstrtab` section.
= Section header table
{parent=elf hello world}
Array of `Elf64_Shdr` structs.
Each entry contains metadata about a given section.
`e_shoff` of the ELF header gives the starting position, 0x40 here.
`e_shentsize` and `e_shnum` from the ELF header say that we have 7 entries, each `0x40` bytes long.
So the table takes bytes from 0x40 to `0x40 + 7 + 0x40 - 1` = 0x1FF.
Some section names are reserved for certain section types: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections e.g. `.text` requires a `SHT_PROGBITS` type and `SHF_ALLOC` + `SHF_EXECINSTR`
Running:
``
readelf -S hello_world.o
``
outputs:
``
There are 7 section headers, starting at offset 0x40:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .data PROGBITS 0000000000000000 00000200
000000000000000d 0000000000000000 WA 0 0 4
[ 2] .text PROGBITS 0000000000000000 00000210
0000000000000027 0000000000000000 AX 0 0 16
[ 3] .shstrtab STRTAB 0000000000000000 00000240
0000000000000032 0000000000000000 0 0 1
[ 4] .symtab SYMTAB 0000000000000000 00000280
00000000000000a8 0000000000000018 5 6 4
[ 5] .strtab STRTAB 0000000000000000 00000330
0000000000000034 0000000000000000 0 0 1
[ 6] .rela.text RELA 0000000000000000 00000370
0000000000000018 0000000000000018 4 2 4
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), l (large)
I (info), L (link order), G (group), T (TLS), E (exclude), x (unknown)
O (extra OS processing required) o (OS specific), p (processor specific)
``
The `struct` represented by each entry is:
``
typedef struct {
Elf64_Word sh_name;
Elf64_Word sh_type;
Elf64_Xword sh_flags;
Elf64_Addr sh_addr;
Elf64_Off sh_offset;
Elf64_Xword sh_size;
Elf64_Word sh_link;
Elf64_Word sh_info;
Elf64_Xword sh_addralign;
Elf64_Xword sh_entsize;
} Elf64_Shdr;
``
= Sections
{parent=elf hello world}
= Index 0 section
{parent=Sections}
Contained in bytes 0x40 to 0x7F.
The first section is always magic: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html says:
> If the number of sections is greater than or equal to SHN_LORESERVE (0xff00), e_shnum has the value SHN_UNDEF (0) and the actual number of section header table entries is contained in the sh_size field of the section header at index 0 (otherwise, the sh_size member of the initial entry contains 0).
There are also other magic sections detailed in `Figure 4-7: Special Section Indexes`.
= `SHT_NULL`
{parent=Index 0 section}
In index 0, `SHT_NULL` is mandatory. Are there any other uses for it: https://stackoverflow.com/questions/26812142/what-is-the-use-of-the-sht-null-section-in-elf ?
= `.data` section
{parent=Sections}
`.data` is section 1:
``
00000080 01 00 00 00 01 00 00 00 03 00 00 00 00 00 00 00 |................|
00000090 00 00 00 00 00 00 00 00 00 02 00 00 00 00 00 00 |................|
000000a0 0d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000b0 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
``
* 80 0: `sh_name` = `01 00 00 00`: index 1 in the `.shstrtab` string table
Here, `1` says the name of this section starts at the first character of that section, and ends at the first NUL character, making up the string `.data`.
`.data` is one of the section names which has a predefined meaning according to http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html[]:
> These sections hold initialized data that contribute to the program's memory image.
* 80 4: `sh_type` = `01 00 00 00`: `SHT_PROGBITS`: the section content is not specified by ELF, only by how the program interprets it. Normal since a `.data` section.
* 80 8: `sh_flags` = `03` 7x `00`: `SHF_WRITE` and `SHF_ALLOC`: http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#sh_flags, as required from a `.data` section
* 90 0: `sh_addr` = 8x `00`: TODO: standard says:
> If the section will appear in the memory image of a process, this member gives the address at which the section's first byte should reside. Otherwise, the member contains 0.
but I don't understand it very well yet.
* 90 8: `sh_offset` = `00 02 00 00 00 00 00 00` = `0x200`: number of bytes from the start of the program to the first byte in this section
* a0 0: `sh_size` = `0d 00 00 00 00 00 00 00`
If we take 0xD bytes starting at `sh_offset` 200, we see:
``
00000200 48 65 6c 6c 6f 20 77 6f 72 6c 64 21 0a 00 |Hello world!.. |
``
AHA! So our `"Hello world!"` string is in the data section like we told it to be on the NASM.
Once we graduate from `hd`, we will look this up like:
``
readelf -x .data hello_world.o
``
which outputs:
``
Hex dump of section '.data':
0x00000000 48656c6c 6f20776f 726c6421 0a Hello world!.
``
NASM sets decent properties for that section because it treats `.data` magically: https://www.nasm.us/doc/nasmdoc7.html#section-7.9.2
Also note that this was a bad section choice: a good C compiler would put the string in `.rodata` instead, because it is read-only and it would allow for further OS optimizations.
* a0 8: `sh_link` and `sh_info` = 8x 0: do not apply to this section type. http://www.sco.com/developers/gabi/2003-12-17/ch4.sheader.html#special_sections
* b0 0: `sh_addralign` = `04` = TODO: why is this alignment necessary? Is it only for `sh_addr`, or also for symbols inside `sh_addr`?
* b0 8: `sh_entsize` = `00` = the section does not contain a table. If != 0, it means that the section contains a table of fixed size entries. In this file, we see from the `readelf` output that this is the case for the `.symtab` and `.rela.text` sections.
= `.text` section
{parent=Sections}
Now that we've done one section manually, let's graduate and use the `readelf -S` of the other sections:
``
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 2] .text PROGBITS 0000000000000000 00000210
0000000000000027 0000000000000000 AX 0 0 16
``
`.text` is executable but not writable: if we try to write to it Linux segfaults. Let's see if we really have some code there:
``
objdump -d hello_world.o
``
gives:
``
hello_world.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <_start>:
0: b8 01 00 00 00 mov $0x1,%eax
5: bf 01 00 00 00 mov $0x1,%edi
a: 48 be 00 00 00 00 00 movabs $0x0,%rsi
11: 00 00 00
14: ba 0d 00 00 00 mov $0xd,%edx
19: 0f 05 syscall
1b: b8 3c 00 00 00 mov $0x3c,%eax
20: bf 00 00 00 00 mov $0x0,%edi
25: 0f 05 syscall
``
If we grep `b8 01 00 00` on the `hd`, we see that this only occurs at `00000210`, which is what the section says. And the Size is 27, which matches as well. So we must be talking about the right section.
This looks like the right code: a `write` followed by an `exit`.
The most interesting part is line `a` which does:
``
movabs $0x0,%rsi
``
to pass the address of the string to the system call. Currently, the `0x0` is just a placeholder. After linking happens, it will be modified to contain:
``
4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi
``
This modification is possible because of the data of the `.rela.text` section.
= `SHT_STRTAB`
{parent=Sections}
Sections with `sh_type == SHT_STRTAB` are called string tables.
They hold a null separated array of strings.
Such sections are used by other sections when string names are to be used. The using section says:
* which string table they are using
* what is the index on the target string table where the string starts
So for example, we could have a string table containing:
``
Data: \0 a b c \0 d e f \0
Index: 0 1 2 3 4 5 6 7 8
``
The first byte must be a 0. TODO rationale?
And if another section wants to use the string `d e f`, they have to point to index `5` of this section (letter `d`).
Notable string table sections:
* `.shstrtab`
* `.strtab`
= `.shstrtab`
{parent=Sections}
Section type: `sh_type == SHT_STRTAB`.
Common name: "section header string table".
The section name `.shstrtab` is reserved. The standard says:
> This section holds section names.
This section gets pointed to by the `e_shstrnd` field of the ELF header itself.
String indexes of this section are are pointed to by the `sh_name` field of section headers, which denote strings.
This section does not have `SHF_ALLOC` marked, so it will not appear on the executing program.
``
readelf -x .shstrtab hello_world.o
``
outputs:
``
Hex dump of section '.shstrtab':
0x00000000 002e6461 7461002e 74657874 002e7368 ..data..text..sh
0x00000010 73747274 6162002e 73796d74 6162002e strtab..symtab..
0x00000020 73747274 6162002e 72656c61 2e746578 strtab..rela.tex
0x00000030 7400 t.
``
The data in this section has a fixed format: http://www.sco.com/developers/gabi/2003-12-17/ch4.strtab.html
If we look at the names of other sections, we see that they all contain numbers, e.g. the `.text` section is number `7`.
Then each string ends when the first NUL character is found, e.g. character `12` is `\0` just after `.text\0`.
= `.symtab`
{parent=Sections}
Section type: `sh_type == SHT_SYMTAB`.
Common name: "symbol table".
First the we note that:
* `sh_link` = `5`
* `sh_info` = `6`
For `SHT_SYMTAB` sections, those numbers mean that:
* strings that give symbol names are in section 5, `.strtab`
* the relocation data is in section 6, `.rela.text`
A good high level tool to disassemble that section is:
``
nm hello_world.o
``
which gives:
``
0000000000000000 T _start
0000000000000000 d hello_world
000000000000000d a hello_world_len
``
This is however a high level view that omits some types of symbols and in which the symbol types . A more detailed disassembly can be obtained with:
``
readelf -s hello_world.o
``
which gives:
``
Symbol table '.symtab' contains 7 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS hello_world.asm
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
4: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 hello_world
5: 000000000000000d 0 NOTYPE LOCAL DEFAULT ABS hello_world_len
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 2 _start
``
The binary format of the table is documented at http://www.sco.com/developers/gabi/2003-12-17/ch4.symtab.html
The data is:
``
readelf -x .symtab hello_world.o
``
which gives:
``
Hex dump of section '.symtab':
0x00000000 00000000 00000000 00000000 00000000 ................
0x00000010 00000000 00000000 01000000 0400f1ff ................
0x00000020 00000000 00000000 00000000 00000000 ................
0x00000030 00000000 03000100 00000000 00000000 ................
0x00000040 00000000 00000000 00000000 03000200 ................
0x00000050 00000000 00000000 00000000 00000000 ................
0x00000060 11000000 00000100 00000000 00000000 ................
0x00000070 00000000 00000000 1d000000 0000f1ff ................
0x00000080 0d000000 00000000 00000000 00000000 ................
0x00000090 2d000000 10000200 00000000 00000000 -...............
0x000000a0 00000000 00000000 ........
``
The entries are of type:
``
typedef struct {
Elf64_Word st_name;
unsigned char st_info;
unsigned char st_other;
Elf64_Half st_shndx;
Elf64_Addr st_value;
Elf64_Xword st_size;
} Elf64_Sym;
``
Like in the section table, the first entry is magical and set to a fixed meaningless values.
= `STT_FILE`
{parent=.symtab}
Entry 1 has `ELF64_R_TYPE == STT_FILE`. `ELF64_R_TYPE` is continued inside of `st_info`.
Byte analysis:
* 10 8: `st_name` = `01000000` = character 1 in the `.strtab`, which until the following `\0` makes `hello_world.asm`
This piece of information file may be used by the linker to decide on which segment sections go: e.g. in `ld` linker script we write:
``
segment_name :
{
file(section)
}
``
to pick a section from a given file.
Most of the time however, we will just dump all sections with a given name together with:
``
segment_name :
{
*(section)
}
``
* 10 12: `st_info` = `04`
Bits 0-3 = `ELF64_R_TYPE` = Type = `4` = `STT_FILE`: the main purpose of this entry is to use `st_name` to indicate the name of the file which generated this object file.
Bits 4-7 = `ELF64_ST_BIND` = Binding = `0` = `STB_LOCAL`. Required value for `STT_FILE`.
* 10 13: `st_shndx` = Symbol Table Section header Index = `f1ff` = `SHN_ABS`. Required for `STT_FILE`.
* 20 0: `st_value` = 8x `00`: required for value for `STT_FILE`
* 20 8: `st_size` = 8x `00`: no allocated size
Now from the `readelf`, we interpret the others quickly.
= `STT_SECTION`
{parent=.symtab}
There are two such entries, one pointing to `.data` and the other to `.text` (section indexes `1` and `2`).
``
Num: Value Size Type Bind Vis Ndx Name
2: 0000000000000000 0 SECTION LOCAL DEFAULT 1
3: 0000000000000000 0 SECTION LOCAL DEFAULT 2
``
TODO what is their purpose?
= `STT_NOTYPE`
{parent=.symtab}
Then come the most important symbols:
``
Num: Value Size Type Bind Vis Ndx Name
4: 0000000000000000 0 NOTYPE LOCAL DEFAULT 1 hello_world
5: 000000000000000d 0 NOTYPE LOCAL DEFAULT ABS hello_world_len
6: 0000000000000000 0 NOTYPE GLOBAL DEFAULT 2 _start
``
`hello_world` string is in the `.data` section (index 1). It's value is 0: it points to the first byte of that section.
`_start` is marked with `GLOBAL` visibility since we wrote:
``
global _start
``
in NASM. This is necessary since it must be seen as the entry point. Unlike in C, by default NASM labels are local.
= `SHN_ABS`
{parent=STT_NOTYPE}
`hello_world_len` points to the special `st_shndx == SHN_ABS == 0xF1FF`.
`0xF1FF` is chosen so as to not conflict with other sections.
`st_value == 0xD == 13` which is the value we have stored there on the assembly: the length of the string `Hello World!`.
This means that relocation will not affect this value: it is a constant.
This is small optimization that our assembler does for us and which has ELF support.
If we had used the address of `hello_world_len` anywhere, the assembler would not have been able to mark it as `SHN_ABS`, and the linker would have extra relocation work on it later.
= `SHT_SYMTAB` on the executable
{parent=.symtab}
By default, NASM places a `.symtab` on the executable as well.
This is only used for debugging. Without the symbols, we are completely blind, and must reverse engineer everything.
You can strip it with `objcopy`, and the executable will still run. Such executables are called "stripped executables".
= `.strtab`
{parent=Sections}
Holds strings for the symbol table.
This section has `sh_type == SHT_STRTAB`.
It is pointed to by `sh_link == 5` of the `.symtab` section.
``
readelf -x .strtab hello_world.o
``
outputs:
``
Hex dump of section '.strtab':
0x00000000 0068656c 6c6f5f77 6f726c64 2e61736d .hello_world.asm
0x00000010 0068656c 6c6f5f77 6f726c64 0068656c .hello_world.hel
0x00000020 6c6f5f77 6f726c64 5f6c656e 005f7374 lo_world_len._st
0x00000030 61727400 art.
``
This implies that it is an ELF level limitation that global variables cannot contain NUL characters.
= `.rela.text`
{parent=Sections}
Section type: `sh_type == SHT_RELA`.
Common name: "relocation section".
`.rela.text` holds relocation data which says how the address should be modified when the final executable is linked. This points to bytes of the text area that must be modified when linking happens to point to the correct memory locations.
Basically, it translates the object text containing the placeholder 0x0 address:
``
a: 48 be 00 00 00 00 00 movabs $0x0,%rsi
11: 00 00 00
``
to the actual executable code containing the final 0x6000d8:
``
4000ba: 48 be d8 00 60 00 00 movabs $0x6000d8,%rsi
4000c1: 00 00 00
``
It was pointed to by `sh_info` = `6` of the `.symtab` section.
`readelf -r hello_world.o` outputs:
``
Relocation section '.rela.text' at offset 0x3b0 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
00000000000c 000200000001 R_X86_64_64 0000000000000000 .data + 0
``
The section does not exist in the executable.
The actual bytes are:
``
00000370 0c 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 |................|
00000380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
``
The `struct` represented is:
``