What goes around comes around, and I’m waiting for knowledge. COOLGUY B)

Prequisites:

Linux experience
C
basic ASM

I’ve been wanting to write about ELF, the executable format used by Linux for a long, long time. This post is planned on being pretty in-depth…so buckle up, put on your big boy pants, take some adderall and get ready to learn.

Why the post on ELF for a pwn blog?

Well, knowing how ELF works is a pretty big part of exploitation. Knowing the sections, what they hold within, what permissions they contain, etc. are large parts of reverse-engineering and pwn itself. Without knowing the basics of ELF, you won’t get very far in the security-game.

ELF is extremely extensive; dynamic linking, symbol tables, and a lot of other things make it a very large software. In this blog post, I will give information on: dynamic linking, relocations, symbols, and other things.

$ man elf

elf - format of Executable and Linking Format (ELF) files

An executable file using the ELF file format consists of an ELF header, followed by a program header table or a section header table, or both. The ELF header is always at offset zero of the file. The program header table and the section header table’s offset in the file are defined in the ELF header. The two tables describe the rest of the particularities of the file.

ELF Types

An ELF file can be such that its type is:

ET_NONE : An unknown type.
ET_REL : A relocatable file; object file. This type of file holds code to create an ET_EXEC file type.
ET_EXEC : An executable file
ET_DYN : A shared object; shared library.
ET_CORE : A core file.

We can view an ELF files header, which will show us its type, among other things, with the readelf -h command; readelf is installed on Linux by default. Here is an example used on a file called elf.

elf

As you can see, there are a few things going on. You can see readelf states it’s an EXEC file, you can see the entry point address, header table information (we’ll talk more on that soon), etc. You might be wondering: if there is ASLR, isn’t the entry point a huge issue? Well, no. The entry point is just a location, probably in the .text, or if using gcc, it’ll be a location “_start”. The entry address is set by the linker when the EXEC file is created.

start

Here you can see the entry point address matches the address of _start, which will invoke “__libc_start_main@plt”. Quite the function, right? We’ll be talking more about that…later on.

ELF Header Structure

ELF program headers are what describe the segments in a binary and are quite necesarry for every process. These segments explain how the program should be put into memory, and are thus very important for the loader. Let’s check out some code (from man elf)

typedef struct {
    uint32_t   p_type;
    uint32_t   p_flags;
    Elf64_Off  p_offset;
    Elf64_Addr p_vaddr;
    Elf64_Addr p_paddr;
    uint64_t   p_filesz;
    uint64_t   p_memsz;
    uint64_t   p_align;
} Elf64_Phdr;

This is the Elf64_Phdr structure; it makes up the program header for ELF. Similarly, here is the code for the 32bit Phdr:

typedef struct {
    uint32_t   p_type; // segment type
    Elf32_Off  p_offset; // segment offset
    Elf32_Addr p_vaddr; // segment virtual address
    Elf32_Addr p_paddr; // segment physical address
    uint32_t   p_filesz; // segment size
    uint32_t   p_memsz; // segment memory size 
    uint32_t   p_flags; //is segment READ | WRITE | EXECUTE; text segment = PF_X and PF_R. Data segment PF_X, PF_W and PF_R. 
    uint32_t   p_align; // segment alignment
} Elf32_Phdr;

There is really no difference between the 32bit and 64bit one. Some information: p_type: says the segment type

PT_NULL: Array element unused. This lets the program header have ignored entries.
PT_LOAD: an EXEC will always have at least 1 segment of this type. Segment is going to be loaded or mapped into memory.
PT_DYNAMIC: Program header for dynamic segment; dynamically linked binaries. Relocation entries and GOT address is here
PT_PHDR: This segment is the Program Header Table. It’s location and information is here.
PT_INTERP: Tells us where the interpreter is (/lib/linux-ld.so) You can view these with readelf -l; output below.

ptype

As you can see, a few segment types we discussed. You might notice two PT_LOAD types; this is due to the binary having both a text segment and a data segment. The text statement contains the code, the data segment contains the global variables and such.

ELF Section Headers

Sections are distinct from segments; sections are inside segments. Segments contain data, this data can be one or many sections. The Section header table just shows the different sections, and what segment they are mapped too. A SHT is not necesarry for program execution, unlike the PHT, which explicitly states the memory layout of the program. With that being stated, the SHT is not necessary for a program; just because it isn’t there, doesn’t mean that the sections don’t exist! It just means it’s stripped.

The SHT is very important for debugging, so finding a binary without it is pretty terrible, frankly. You won’t be able to find the symbols or anything, making tools such as readelf, objdump, gdb, radare2, etc. pretty much useless.

Common Sections:

.text
.data
.plt : we’ll talk about this later
.got : ^
.got.plt: ^
.rodata: read only data
.ctors/.dtors: execute after and before main()
.bss: globally uninitialized variables

ELF Symbols

Symbols are a reference to a function or variable in the code, such as scanf(). scanf will have a symbol in .dynysm, the dynamic symbol table, which points to its address. There are two types of symbol tables; .dynsym and .symtab. The dynamic symbol table will only contain libc functions, whyle the symbol table will contain the entries in the dynamic symbol table + local variables and functions. Why are there more than two tables? The .dynsym is needed by the linker to resolve libc addresses at runtime, the .symtab is not; which is why you might notice that production binaries remove .symtab alltogether to save space.

Dynamic linker: relocations

Relocation is the process of connecting references with what the reference refers to. For example, when a program calls a function, the program must call the proper address. Let’s give an example. Say your spouse ordered pizza while you weren’t home, and before you get home, she sends you a text message saying “The pizza is on top of the kitchen counter.” - you know where it is before you get home, and once you get home, you know where to go to find pizza! Let’s see how relocations are defined under

typedef struct {
    Elf64_Addr r_offset; // location that requires relocation
    uint64_t   r_info; // type of relocation
} Elf64_Rel;

You might, sometimes, even see something such as:

typedef struct {
        Elf64_Addr r_offset;
        uint64_t   r_info;
        int64_t    r_addend;
} Elf64_Rela;

Obviously, the lower code is used when a relocation requires an addend. Each relocation has a specific type, defined in the ABI. Here’s an example:

extern int a;
int main(){
 return a;
}

Compiled with gcc -c

relocs

We see here the type of relocation which is occuring, offset and name. Here’s a small list of the types of possible relocations for x64.

vabi

ELF Dynamic Linking

When a program is dynamically linked, the linker loads shared libraries onto the process. Shared libraries (relocatable files) are position independent code, so you can basically put them anywhere; they don’t need their own address space either.

A shared library is a file of e_type ET_DYN, which we spoke about above. This means it is dynamic. When a program wants to run a shared library, the interpreter (.INTERP will take care of finding it) will load it and run it.

The dynamic linker must modify the GOT (Global offset table) which is a table of addresses located in the data segment. It is in the data segment because it must be writeable, however, after initial writing is done, a lot of binaries implement RELRO which will then mark the .got.plt as READ-ONLY. The dynamic linker edits GOT with the resolved addresses.

PLT and GOT

More relocations, really. We spoke about how the dynamic linker gets the addresses needed for external functions (shared libraries), so we already know that those shared library addresses, e.g. the address for printf() will be stored in GOT, as said above. Then what is PLT? Well, PLT is what jumps to the address. While GOT holds the address, PLT jumps to it.

Let’s take a look at an example.

plt

We can see main invokes a call to puts@plt.

You can then see the jump towards the GoT address, which holds the actual address of puts(). However, it is important to note that the linker, by default, does not have these addresses in the GoT. They are determined as they are ran, unless strict linking is enabled, which is the case with binaries with RELRO.