Understanding Symbols Relocation
Relocation of symbols is conceptually a simple operation: when compiling/running a program, the references to symbols has to be replaced by their real location in memory. But under the hood, when relocation process takes place ? When are the relocation sections used ?
To answer my questions, I experimented on a x86_64 architecture (Intel Core i7).
In the following text, when I talk about linker or link-editor, I mean the program that takes several object files and link them altogether to produce either an executable or a shared library.
The dynamic linker is a piece of code that is executed alongside an executable to resolve the dynamic symbols at runtime.
Simple Case: Static Linkage
Let’s start with the simplest case: we will statically link an executable.
#include "nothing.h"
int main(int argc, const char *argv[])
{
doAlmostNothing();
return 0;
}
And the called code:
#include "nothing.h"
static void doNothingStatic() {}
void doNothing() {}
void doAlmostNothing()
{
doNothingStatic();
doNothing();
}
The function doAlmostNothing
calls the exported function doNothing
and
statically linked function doNothingStatic
. doNothingStatic
is local to the
generated object file, hence the compiler is able to compute the good address.
On the contrary, doNothing
can be reference by another object file and used
when linking an executable. To produce an executable, the link-editor will have
to place a doNothing
somewhere and replace all the reference to it by its
effective address.
We disassemble the nothing.o
:
> objdump -d nothing.o
nothing.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <doNothingStatic>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: 90 nop
5: 5d pop %rbp
6: c3 retq
0000000000000007 <doNothing>:
7: 55 push %rbp
8: 48 89 e5 mov %rsp,%rbp
b: 90 nop
c: 5d pop %rbp
d: c3 retq
000000000000000e <doAlmostNothing>:
e: 55 push %rbp
f: 48 89 e5 mov %rsp,%rbp
12: b8 00 00 00 00 mov $0x0,%eax
17: e8 e4 ff ff ff callq 0 <doNothingStatic>
1c: b8 00 00 00 00 mov $0x0,%eax
21: e8 00 00 00 00 callq 26 <doAlmostNothing+0x18>
26: 90 nop
27: 5d pop %rbp
28: c3 retq
Looking at offset 17, we can see the call to doNothingStatic
. This function
is local to the file, so its offset can be directly written. Due to
little-endianess of x86 architecture, 0xffffffe4 is -1c bytes from the next
instruction pointer value which is 0x1c. Hence, this is a call to the function
written at address 0x0 which is doNothingStatic
.
On the contrary, the compiler did not put the address of the doNothing
function, although he could give an address if he assumes the code is linearly
mapped in memory. I don’t know, maybe it is a convention. I keep this question
for latter. Anyway this gives us the opportunity to explain a basic relocation.
If we only look at the bytes in the assembler code (the translation in readable assembler code makes use of the sections we will describe now to show which function is called), we can see the 4 bytes (32 bits) are zeros. It will be the role of the linker to fill such portion of the assembler code with correct values when the object file has to be used.
But the linker cannot magically guess which values to put in the final binary
file. The compiler will put some information in ELF sections that are dedicated
to the relocations: depending on the targeted architecture, the involved
section are .rel.text
(x86_32) or .rela.text
(x86_64).
> readelf -r nothing.o
Relocation section '.rela.text' at offset 0x250 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000022 000900000002 R_X86_64_PC32 0000000000000007 doNothing - 4
(On recent versions of GCC, the relocation type has changed to R_X86_64_PLT32
on x86_64
. This is due to the commit. This does not
change anything for static linkage explanation and mentioning the PLT now would
be confusing: go to the dynamic link paragraph to know more about the PLT)
What this says to the link editor is: “Be careful, what is at offset 22 has to be replaced by an address that can be calculated in the way described by the relocation type X86_64_PC32. For such a calculation, you can use the value (here 0x7) and the addend (here -4)”. The type of relocation tells the linker how to calculate the effective address. In this case S + A - P where:
- S: The value of the symbol whose index resides in the relocation entry.
- A: The addend used to compute the value of the relocatable field.
- P: The section offset or address of the storage unit being relocated
Can we validate this in the produced executable ?
Here is a final binary produced by the linker:
0000000000000660 <main>:
660: 55 push %rbp
661: 48 89 e5 mov %rsp,%rbp
664: 48 83 ec 10 sub $0x10,%rsp
668: 89 7d fc mov %edi,-0x4(%rbp)
66b: 48 89 75 f0 mov %rsi,-0x10(%rbp)
66f: b8 00 00 00 00 mov $0x0,%eax
674: e8 15 00 00 00 callq 68e <doAlmostNothing>
679: b8 00 00 00 00 mov $0x0,%eax
67e: c9 leaveq
67f: c3 retq
0000000000000680 <doNothingStatic>:
680: 55 push %rbp
681: 48 89 e5 mov %rsp,%rbp
684: 90 nop
685: 5d pop %rbp
686: c3 retq
0000000000000687 <doNothing>:
687: 55 push %rbp
688: 48 89 e5 mov %rsp,%rbp
68b: 90 nop
68c: 5d pop %rbp
68d: c3 retq
000000000000068e <doAlmostNothing>:
68e: 55 push %rbp
68f: 48 89 e5 mov %rsp,%rbp
692: b8 00 00 00 00 mov $0x0,%eax
697: e8 e4 ff ff ff callq 680 <doNothingStatic>
69c: b8 00 00 00 00 mov $0x0,%eax
6a1: e8 e1 ff ff ff callq 687 <doNothing>
6a6: 90 nop
6a7: 5d pop %rbp
6a8: c3 retq
6a9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
Here we spot 2 things:
- The call to
doNothingStatic
has not changed. In fact, the linker only treats the.text
section has raw byte stream and simply concatenates all those sections from all object files. The call todoNothingStatic
was already a relative jump from the next instruction to execute. - The linker calculated that call to
doNothing
was a jump to0x6a6 + 0xffffffe1 = 0x687
. Here the.text
section ofnothing.o
starts at 0x680. The linker knows from the relocation section that it will have to change the value at0x6a2 (=0x680 + 0x22)
so that it jumps towards0x687 (=0x680 + 0x7)
. The relocation being of typeR_X86_64_PC32
, the value will be relative to the PC (Program Counter), the IP register will be0x6a6 (=0x6a2 + 4 bytes = 0x680 + 0x22 + 0x4)
. The relative jump will then be:0x687 - 0x6a6 = 0x680 + 0x7 - (0x680 + 0x22 + 0x4) = 0x7 - 0x4 - 0x22 = - 0x1f
which is0xffffffe1
in complement to two. We recognize here what was in the relation section with S = 0x7, A = -4 and P = 0x22.
There are a few interesting things to say about the main
function. The linker
also operated a relocation to the doAlmostNothing
function. Let us see the
relocation information from the object file containing the main function:
> readelf -s --wide prog0.o | grep doAlmostNothing
10: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND doAlmostNothing
> readelf -r prog0.o
Relocation section '.rela.text' at offset 0x208 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000015 000a00000004 R_X86_64_PLT32 0000000000000000 doAlmostNothing - 4
The undefined symbol doAlmostNothing
will have to be relocated. This time the
type of the relocation is R_X86_64_PLT32
. We will see later that in the case of
a position-independent code, the call to a function be done through a table
called the Procedure Linkage Table which is used by the dynamic linker at
runtime.
This type has been chosen in case we would link the nothing.o in a shared
library and link the executable with this dynamic library. In the case all is
statically linked, the linker will consider it will have to do the same job as
if the relocation type was R_X86_64_PC32
relocation (as written in the gold
linker in x86_64.cc:3637).
Relocation when using Dynamic Libraries
Quick Introduction
When statically linking an executable, all the external functions the program relies on are stored in the final file. In fact, the link editor will concatenate all the .text parts into the final file. In the end, when the executable is run, all this code is mapped in memory.
Although simple, this approach has several drawbacks:
- if several programs uses the same functions, they will all have their own copy of the code of these functions. Clearly, on systems that allows several programs to run at the same time, some space on disk and in memory is wasted.
- if you detect a bug in one of the functions that is used by several programs, fixing this bug will require you to rebuild all the programs.
To cope with such drawbacks, we could put the shared code somewhere in memory so that the all the dependent programs would jump to this location to execute this common code. In fact, the virtual memory system will hide the real position of the dynamic library in physical memory.
This is the way the shared libraries works. But for this to work, it requires the introduction of new actors in the runtime environment. The link editor alone is no more able to resolve all the symbols because, by definition, it is not aware of the addresses of the shared code at runtime.
Hence, some kind of dynamic linker is required to relocate at runtime the
undefined symbols. On GNU/Linux, this special process is generally provided by
the glibc. An executable that depends upon shared libraries, holds a reference
to the path toward the dynamic linker to use. This path is stored in the
.interp
section of the executable:
> readelf -S prog1_dynamic.out
There are 31 section headers, starting at offset 0x1a80:
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .interp PROGBITS 0000000000000238 00000238
000000000000001c 0000000000000000 A 0 0 1
[...]
$hexdump -C prog1_dynamic.out
[...]
00000230 01 00 00 00 00 00 00 00 2f 6c 69 62 36 34 2f 6c |......../lib64/l|
00000240 64 2d 6c 69 6e 75 78 2d 78 38 36 2d 36 34 2e 73 |d-linux-x86-64.s|
00000250 6f 2e 32 00 04 00 00 00 10 00 00 00 01 00 00 00 |o.2.............|
..[.]
When running this executable, the /lib64/ld-linux-x86-64.so.2
will somehow
have to start and handle the undefined symbols.
Position Independent Code
When we think about it, the job of the dynamic linker could be simple. Based on PC-Relative relocations inserted by the link-editor, it could put the real addresses of the called function/accessed variables at the call locations.
This has two drawbacks:
- this would mean when the program starts, the dynamic linker would have to perform all (and probably a lot) of relocations impacting the program startup time.
- this would also mean the dynamic linker would modify the program code loaded in memory. Nowadays, for security reasons, the executable code is stored in read-only memory pages. For such systems, this is not impossible: the dynamic linker would have the additional work of changing the permission on memory pages to RW and to set it back to RO after the content has been patched.
As usual in computer sciences, the solution consists in adding an indirection layer. This indirection will be performed by the Global Offset Table (GOT) and the Procedure Linkage Table (PLT).
> readelf --segments prog0_dynamic.out
Elf file type is DYN (Shared object file)
Entry point 0x650
There are 9 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000001f8 0x00000000000001f8 R E 0x8
INTERP 0x0000000000000238 0x0000000000000238 0x0000000000000238
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000000096c 0x000000000000096c R E 0x200000
LOAD 0x0000000000000dc8 0x0000000000200dc8 0x0000000000200dc8
0x0000000000000268 0x0000000000000270 RW 0x200000
[...]
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.ABI-tag .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .plt.got .text .fini .rodata .eh_frame_hdr .eh_frame
03 .init_array .fini_array .jcr .dynamic .got .got.plt .data .bss
[...]
As we can see, in the previous readelf
output, those tables .got
and
.got.plt
will be loaded in Read/Write memory pages (to cope with the security
limitations) and will be filled at runtime:
- at program startup for global variables (
.got
) - on the first call to a function (
.got.plt
)
This implies that an object file to be included in the shared library cannot write PC-Relative or absolute relocation information. It will have to indicate that the call/access will have to be done via the PLT/GOT.
Hence, when compiling some code to be embedded in shared library, we must
require the compiler to generate Position Independent Code. This can be done
using the -fPIC
option as shown in the following example:
> gcc -Wall -g -O0 -fPIC -c nothing.c -onothing_pic.o
> gcc -shared -o libnothing.so nothing_pic.o
Calling a Shared Library Function
In the previous example, there was a PC-relative relocation for the symbol
doAlmostNothing
. This was possible because the linker knew where the function
was located.
If we put this function in a shared library and re-link the program with this
library, the link editor and the dynamic linker will cooperate to call
doAlmostNothing
. The link editor will put some special relocation type that
will be used by the dynamic linker to locate the function to call. How does it
work under the hood ?
Comparing the dynamic and the static version shows a difference in the way the
doAlmostNothing
is called.
> objdump -d -s prog0.out
[...]
674: e8 15 00 00 00 callq 68e <doAlmostNothing>
[...]
$objdump -d -s prog0_dynamic.out
[...]
794: e8 97 fe ff ff callq 630 <doAlmostNothing@plt>
[...]
We can see the execution does not jump directly to the code of the function but to an intermediary code linked to the PLT (Procedure Linkage Table) we had a few words about:
0000000000000630 <doAlmostNothing@plt>:
630: ff 25 e2 09 20 00 jmpq *0x2009e2(%rip) # 201018 <doAlmostNothing>
636: 68 00 00 00 00 pushq $0x0
63b: e9 e0 ff ff ff jmpq 620 <.plt>
Which itself seems to jump to a common piece of code (see 0x630
):
0000000000000620 <.plt>:
620: ff 35 e2 09 20 00 pushq 0x2009e2(%rip) # 201008 <_GLOBAL_OFFSET_TABLE_+0x8>
626: ff 25 e4 09 20 00 jmpq *0x2009e4(%rip) # 201010 <_GLOBAL_OFFSET_TABLE_+0x10>
62c: 0f 1f 40 00 nopl 0x0(%rax)
As objdump
is gentle enough to resolve the addresses involved at 620 and 626,
this .plt
section (loaded in executable segments) references 2 hard coded
address entries in the Global Offset Table (GOT): the entries 2 and 3 of the
GOT.
If we look at how the segments are mapped in memory, we can see the .plt
section is in READ and EXEC memory pages. This section is a set of functions
which aim at finding the address of the function to call. The terminology used
in the glibc is trampoline (see. sysdeps/x86_64/dl-trampoline.S in the glibc
source code).
So what happened to call the doAlmostNothing
? Let’s try to tidy this mess to
understand who is involved in such a call.
The linker can see that a call to doAlmostNothing
will have to be performed
(but the location of the code is not known at link-editor phase). It will:
- create a section
.got.plt
(if does not already exist) - write the address of the
.dynamic
section in the first entry of the.got.plt
(if not already done) - add a relocation of type JUMP_SLOT (here
R_X86_64_JUMP_SLO
): the offset gives an address in the PLT where the effective address of the function will have to be set.
> readelf -r prog0_dynamic
Relocation section '.rela.plt' at offset 0x5f0 contains 1 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000201018 000100000007 R_X86_64_JUMP_SLO 0000000000000000 doAlmostNothing + 0
>readelf --sections prog0_dynamic
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[...]
[23] .got PROGBITS 0000000000200fd0 00000fd0
0000000000000030 0000000000000008 WA 0 0 8
[24] .got.plt PROGBITS 0000000000201000 00001000
0000000000000020 0000000000000008 WA 0 0 8
[...]
The .got
and .got.plt
will be loaded on Read/Write memory pages so that it
can be updated at runtime as you can see by using readelf --segments
.
The linker set the content at 0x201018 (in the GOT) to the address of the
second instruction of <doAlmostNothing@plt>
.
That’s all for the link editor. Now when executing the program, for the
first call to doAlmostNothing
, the instruction at 0x630 is simply a jump to
0x636. This one, will push the index of the symbol in the GOT.
Then we jump to the magic code. To understand it, we must know that, at program startup, the dynamic linker set some values in the entries 2 and 3 of the GOT to call itself.
So, instruction at 626 calls the dynamic linker with the index in the GOT as a parameter. This way, it will perform 2 steps:
- resolve the address of the
doAlmostNothing
thanks to relocation information - store its address at the good index in the GOT
After this, the content at 0x201018 will be the real address of the function.
Hence, a second call to doAlmostNothing
will not require the dynamic linker
anymore.
This process can be visible within a debugger session:
(gdb) disas
Dump of assembler code for function doAlmostNothing@plt:
=> 0x0000555555554630 <+0>: jmpq *0x2009e2(%rip) # 0x555555755018
0x0000555555554636 <+6>: pushq $0x0
0x000055555555463b <+11>: jmpq 0x555555554620
End of assembler dump.
(gdb) x/a 0x555555755018
0x555555755018: 0x555555554636 <doAlmostNothing@plt+6>
This confirms the first line is, the first time, the address of the next
instruction. Let’s resume the execution, we break just after the call to
doAlmostNothing
.
(gdb) c
Continuing.
Breakpoint 2, main (argc=1, argv=0x7fffffffe7f8) at prog0.c:6
6 return 0;
(gdb) x/a 0x555555755018
0x555555755018: 0x7ffff7ff270e <doAlmostNothing>
And this time, the value at 0x555555755018 is now the effective address of the
doAlmostNothing
function. We can verify that this address points to the
executable memory space of shared library libnothing
:
$ cat /proc/<pid>/map
[...]
7ffff7ff2000-7ffff7ff3000 r-xp 00000000 08:01 389081 <path>/libnothing.so
[...]
Variable Symbol Relocations
So far, we only saw how function symbols were being relocated. What if a shared library exposes a global variable, that can be used at the same time locally by the library and externally by a program that depends on the library ? This time again, the dynamic linker will use relocations information provided by the link editor to locate the address of this variable.
We can imagine a shared library that defines a string kExternString
and also
a function printExternalString
that prints that variable out. An executable
call this method and also directly print the variable.
> readelf -r libprinter.so.1
Relocation section '.rela.dyn' at offset 0x520 contains 11 entries:
Offset Info Type Sym. Value Sym. Name + Addend
[...]
000000200ff0 000c00000006 R_X86_64_GLOB_DAT 0000000000201040 kExternString + 0
[...]
> readelf -r prog1_dynamic.out
Relocation section '.rela.dyn' at offset 0x578 contains 10 entries:
Offset Info Type Sym. Value Sym. Name + Addend
[...]
000000201038 000b00000005 R_X86_64_COPY 0000000000201038 kExternString + 0
[...]
There are two types of relocation we have not met yet:
- from the shared library, R_X86_64_GLOB_DAT
- from the executable, R_X86_64_COPY
R_X86_64_GLOB_DAT relocation is triggered by the internal call by
printExternalString
: it gives the offset where to find the variable
value is stored.
R_X86_64_COPY tells the dynamic linker to copy the address of the value in the
GOT at address given by the offset member (here 0x000000201038
). This way the
code will access the variable via the GOT.
The dynamic linker knows where the kExternString
is located: it is calculated
from the load address of the shared library + the value of the symbol (taken
from the dynamic symbols table .dynsym
). In our case:
> readelf --symbols libprinter.so
Symbol table '.dynsym' contains 17 entries:
Num: Value Size Type Bind Vis Ndx Name
[...]
12: 0000000000201040 8 OBJECT GLOBAL DEFAULT 23 kExternString
[...]
If the library is loaded at 0x7ffff7bd7000
, the location of the
kExternString
(of the pointer toward the sequence of null-terminated
characters) will be 0x7ffff7dd8040
. This value is copied to the GOT.
(gdb) x/a 0x7ffff7bd7000 + 0x201040
0x7ffff7dd8040 <kExternString>: 0x7ffff7bd783d
Let us say now the read/write segment of my program is loaded at
0x555555755000
and the .got
section must be loaded at offset 0x30
(readelf --sections <prog>
), the first entry is 8 bytes further. Hence to
access kExternString
, its address will have to be taken at 0x555555755038
.
With those initial conditions, we can validate that, at runtime, the good
address is used:
(gdb) x/a 0x555555755038
0x555555755038 <kExternString>: 0x7ffff7bd783d
Conclusion
After this exercise, I have a clearer idea of the linkers job and how the relocations are handled. There are so much thing to dig into like the visibility of symbols, the way thread local storage is handled, the versioning of symbols. I will stop here. If a reader find an error, he can submit a Pull Request.
References
- The bright series of post by the author of the gold about linkers https://www.airs.com/blog/archives/38
- Oracle Linker and Libraries Guide
- Ulrich Drepper’s How To Write Shared Libraries