Skip to content

Generating executable files from scratch

Cameron Swinoga edited this page Jun 9, 2017 · 18 revisions

Introduction

The first step in getting the operating system to execute arbitrary assembly is to figure out what types of executable files your operating system supports. Since I would be developing my compiler on Linux, the suitable Executable and Linkable Format (ELF) was chosen. There is a veritable wealth of information on the composition of ELF files, however since the ELF standard is very large and overarching it is not a simple matter to be able to pick and choose what is needed to get a bare minimum example working. As such, I am writing this as a compendium of all the research and piecing together that I did to be able to write YABFC.

Starting at the top

Looking through the documentation for the system standard header elf.h, there is a few given structures that we can use to set up the executable file. For certain reasons I will be using a 64 bit version of an ELF executable rather than a 32 bit version. The first few lines for setup are pretty straight forward and rigorously defined:

Elf64_Ehdr ELFHeader; // Initialize the ELF header

ELFHeader.e_ident[EI_MAG0]       = 0x7f; // Magic numbers
ELFHeader.e_ident[EI_MAG1]       = 'E';
ELFHeader.e_ident[EI_MAG2]       = 'L';
ELFHeader.e_ident[EI_MAG3]       = 'F';
ELFHeader.e_ident[EI_CLASS]      = ELFCLASS64;    // 64 bit ELF
ELFHeader.e_ident[EI_DATA]       = ELFDATA2LSB;   // little-endian
ELFHeader.e_ident[EI_VERSION]    = EV_CURRENT;    // Current version
ELFHeader.e_ident[EI_OSABI]      = ELFOSABI_SYSV; // UNIX System V ABI
ELFHeader.e_ident[EI_ABIVERSION] = 0x0;           // ABI version needs to be 0

for (int i = EI_PAD; i < EI_NIDENT; i++) ELFHeader.e_ident[i] = 0x0; // Zero padding

ELFHeader.e_type    = ET_EXEC;            // Executable file
ELFHeader.e_machine = EM_X86_64;          // AMD x86-64
ELFHeader.e_version = EV_CURRENT;         // Current version

After this, things start to get a little more complicated. We need to configure the entry point of the program, program & section header table offsets as well as header sizes. The ELF specification does not define where all the different sections are to be placed in the file as long as the memory offsets correspond to a section of memory with the correct data. An important distinction here is the difference between memory on file and program runtime memory, hereby referred to as memory location (_MEM_LOC) and file location (_FILE_LOC).

File location is the physical address offsets (offsets because the operating system abstracts the ACTUAL physical address) of the file that you are creating. This is telling the operating system where to look in your file in order to read the correct data. This data is then put into the program runtime memory (virtual address space) where it can be dynamically read by the program. Saying that, we need to start by picking a virtual address from where to base the program.

Arbitrary numbers and where to find them

I initially was planning to make a 32 bit ELF, so initial poking around on various threads lead me to the magical "somewhere above 0x8048000" number which is ~128 MiB. In 64 bit land, the 0x4000000 address seemed to be used so this is now the origin memory address used going forward. A thread that explains some of the magic that these numbers represent is here.

List of resources

Clone this wiki locally