Reverse engineering 32 and 64 bits binaries with Radare2 - 1

Reverse engineering 32 and 64 bits binaries with Radare2 - 1

About this course

This series of post that I’m starting with this one will cover the fundamentals of reverse engineering with the “popular” radare2 framework. What I want to do here is to walk you through a diverse set of examples of C-written binaries that will cover the most common algorithms and data structures you can find in programs out there, so at the end you will be able to identify them and work with them. This won’t be an advanced/pro course but it will start from the very bottom and will cover far more than the average free course/tutorial.

On these series of posts we’ll start with an C source file / algorithm, compile it in both 32 bits and 64 bits systems and then reverse engineer it using radare2, so you’ll be able to appreciate the differences between 32 and 64 bit binaries. Once we cover the fundamentals of reversing we’ll focus on x64.

I assume that you have some fundamental knowledge about computer arquitectures and know some basic asm instructions such as mov, push and such. I also assume that you already know what radare2 is and thus want to finally learn how to use it.

Wanna show your support to this course? Follow me in Twitter! @artikblue If people show interest I’ll post more content :)

Getting radare2

Most of the people who use radare2 use it on Linux systems, as from there you can analyze all sorts of binaries and if you want to debug them you can connect to a remote debug session or use gbd on Linux. In here we’ll use r2 on Linux most of the time and move it to Windows when needed.

Radare2 supports a ton of arquitectures from from x64 to arduinos or tamagochis and you can analyze binaries related to those arquitectures from the comfort of your ubuntu system or whatever you use.

Said that, the best way to get radare2, as the webpage itself suggests is to clone it from the repositories:

git clone https://github.com/radare/radare2
cd radare2
sys/install.sh   # just run this script to update r2 from git

You can also install it from the apt/rpm/yaourt/whatever repo in most distributions with something like:

sudo apt-get install radare2

And if you wish to use it on windows you can download the windows setup from its website.

Hello world

Every programming journy begins with the classical hello world program and this one won’t be less. And a hello world in C language will look exactly like that:


#include <stdio.h>
int main() {
   printf("Hello, World!");
   return 0;
}

A program like that can be easily compiled using the GCC compiler like this:

gcc -w hello_world.c -o hello_word

And if we run it we are basically going to get a “Hello, World!” on our screen and that will be all, very simple but we need to begin with something :)

Once the program is compiled we can open it with r2 by using “radare2 program” where program is the program you want to analyze, r2 is also an alias of radare2 so you can use it instead if you want. Once the binary is loaded, we need radare to parse the program; running aaaa will analize the binary and detect data structures, functions calls and these kinds of elements. If you want to better understand how this works this answer on stackoverflow is quite explainatory.

lab@lab-VirtualBox:~$ radare2 c_examples/bin/hello_world 
 -- Use V! to enter into the visual panels mode (dwm style)
[0x08048310]> aaaa
[Cannot analyze at 0x08048300g with sym. and entry0 (aa)
[x] Analyze all flags starting with sym. and entry0 (aa)
[Cannot analyze at 0x08048300ac)
[x] Analyze function calls (aac)
[x] Analyze len bytes of instructions for references (aar)
[x] Check for objc references
[x] Check for vtables
[x] Type matching analysis for all functions (aaft)
[x] Propagate noreturn information
[x] Use -AA or aaaa to perform additional experimental analysis.
[x] Finding function preludes
[x] Enable constraint types analysis for variables
[0x08048310]> 

Once the program is loaded and parsed, plenty of actions can be done on it, such as listing strings, disassembling code blocks and more. One of the most common things you’ll want to do when you get your hands on an unknown binary is to get a list of its functions. You can do it using the afl command.

[0x08048310]> afl
0x08048310    1 33           entry0
0x080482f0    1 6            sym.imp.__libc_start_main
0x08048350    4 43           sym.deregister_tm_clones
0x08048380    4 53           sym.register_tm_clones
0x080483c0    3 30           entry.fini0
0x080483e0    4 43   -> 40   entry.init0
0x080484a0    1 2            sym.__libc_csu_fini
0x08048340    1 4            sym.__x86.get_pc_thunk.bx
0x080484a4    1 20           sym._fini
0x08048440    4 93           sym.__libc_csu_init
0x0804840b    1 46           main
0x080482e0    1 6            sym.imp.printf
0x080482ac    3 35           sym._init
[0x08048310]> 

In examples such as this one, the most interesting function is “main”. Entry* functions can be interesting as well but as this is a hello world program in functions like those you’ll probably find a bunch of code you won’t understand at all at this moment (some internal stuff added by the compiler) so we can leave them for now. Other functions we are seeing in this example are related to the C libraries/calls used by the progam (as the printf call you’ll probably identify there).

Another interesting initial thing we can do here is for example get the general information about the binary with iI:

[0x08048310]> iI
arch     x86
baddr    0x8048000
binsz    6115
bintype  elf
bits     32
canary   false
class    ELF32
compiler GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
crypto   false
endian   little
havecode true
intrp    /lib/ld-linux.so.2
laddr    0x0
lang     c
linenum  true
lsyms    true
machine  Intel 80386
maxopsz  16
minopsz  1
nx       true
os       linux
pcalign  0
pic      false
relocs   true
relro    partial
rpath    NONE
sanitiz  false
static   false
stripped false
subsys   linux
va       true

[0x08048310]> 

That will prompt values such as the architecture or the compiler. Those values are always interesting so we can know what we are dealing with and start discarting strategies focusing on what matters.

We can also list the imports with ii

[0x08048310]> ii
[Imports]
nth                vaddr              bind   type              name
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
577730550794551297 0xb6f9994600000000 FUNC   printf            
2                  0xb6f9994d00000000 NOTYPE __gmon_start__    
577730619514028035 0xb6f9994600000000 FUNC   __libc_start_main 

In this particular case, by know we should be sure that we are dealing with a simple program written in C.

Another command of our interest may be the iz one. That will list all of the strings contained within the data section of the program (izz will list strings in the whole file). That command is specially useful when dealing with simple crackmes (so we can identify hardcoded passwords).

In general terms knowing the strings inside a program is helpful when it comes to getting a general idea of “what the program is about”.

[0x08048310]> iz
[Strings]
nth           paddr             vaddr       len size     section type          string
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
5222680231936 0x80484c000000000 0xd00000000 14  34461128 ascii   Hello, World! bin.strings

[0x08048310]> 

As we can see, iz detects the “Hello, World!” string, the string that appears everytime we run the program. Radare also tells us about the section (ascii) and the location of the string.

Now that we know some of the very basic fundamentals let’s actually reverse the program and try to figure out what and how it does.

[0x08048310]> sf main
[0x0804840b]> pdb
            ; DATA XREF from entry0 @ 0x8048327
┌ 46: int main (int32_t arg_4h, char **argv, char **envp);
│           ; var int32_t var_4h @ ebp-0x4
│           ; arg int32_t arg_4h @ esp+0x24
│           0x0804840b      8d4c2404       lea ecx, [arg_4h]
│           0x0804840f      83e4f0         and esp, 0xfffffff0
│           0x08048412      ff71fc         push dword [ecx - 4]
│           0x08048415      55             push ebp
│           0x08048416      89e5           mov ebp, esp
│           0x08048418      51             push ecx
│           0x08048419      83ec04         sub esp, 4
│           0x0804841c      83ec0c         sub esp, 0xc
│           0x0804841f      68c0840408     push str.Hello__World       ; 0x80484c0 ; "Hello, World!" ; const char *format
│           0x08048424      e8b7feffff     call sym.imp.printf         ; int printf(const char *format)
│           0x08048429      83c410         add esp, 0x10
│           0x0804842c      b800000000     mov eax, 0
│           0x08048431      8b4dfc         mov ecx, dword [var_4h]
│           0x08048434      c9             leave
│           0x08048435      8d61fc         lea esp, [ecx - 4]
└           0x08048438      c3             ret
[0x0804840b]> 

The first thing that we have to understand here is that, we are inspecting main, and main is a function, so it has to return to somewhere and it may receive arguments or need space for local variables, operations related to those aspects are performed at the beggining and the end of the code.

The part that we’ll have to focus on here starts with push str.Hello__World and then the call to printf. In 32 bits systems the way of passing parameters to functions consists of pushing those parameters to the stack and then executing a call operation on the address of the function. So something like “printf(a);” in C would be a “push a; call printf” in asm. So what about the rest of the code? Well, the first part of the code is related establishing a new stack frame and a stack frame is nothing more than a section of the code that will contain local variables of the function, arguments passed and values such as that. The end of the code

x64 binaries and its particularities

Let’s repeat the process now but compiling the same program and opening it in r2 in an x64 system.

[0x00000540]> sf sym.main
[0x0000064a]> pdb
            ;-- main:
/ (fcn) sym.main 28
|   sym.main ();
|              ; DATA XREF from 0x0000055d (entry0)
|           0x0000064a      55             push rbp
|           0x0000064b      4889e5         mov rbp, rsp
|           0x0000064e      488d3d9f0000.  lea rdi, qword str.Hello__World ; 0x6f4 ; "Hello, World!" ; const char * format
|           0x00000655      b800000000     mov eax, 0
|           0x0000065a      e8c1feffff     call sym.imp.printf         ; int printf(const char *format)
|           0x0000065f      b800000000     mov eax, 0
|           0x00000664      5d             pop rbp
\           0x00000665      c3             ret
[0x0000064a]> 

The main difference between a x32 program and a x64 one in small examples like this one is that, as you may have seen parameters to functions aren’t passed by the stack. In x64 programs like this one, parameters are passed using registers such as rdi, rsi, rdx, rcx, r8 and r9 in that order. We’ll keep walking through the x64 architecture with clear and specific examples but you may want to consider reading a cheatsheet like this one to gain a more solider understanding on topics such as this last one

There is nothing much more to comment by now.

If we look at the code, in this case, at the beggining of the code, we can appreciate a “push rbp, mov ebp, rsp” This is related to the stack alignment. Then what comes next is way more clear, with lea we load the effective address of the null terminated string “Hello, World!” to the RDI register, then we load the hex value 0 to eax and we proceed to call printf(). We load eax with 0 mainly because the printf function has variable arguments such as the string and a whole lot of format related parameters among others, printf can work with vector registers, for example when doing printf(“%f”, 1.0f) then the progam will have to set eax to 1 to indicate that. This stackoverflow question offers a very clear explanation on the mov eax,0 issue

At the end of the program, we see a mov eax, 0 then pop rbp and a ret being executed. The first the the first one seats eax to 0 as it gets the program ready to do a “return 0” then the second one restores the stack to its original state via picking the original stack frame value to rbp. Then ret just returns the function exiting it with 0 value.

Commands used

Today we basically used those commands:

Command Description
aaaa Fully analyze the binary
afl List all the functions in the binary
ii List imports
iI Information about the binary
iz List strings in the binary
sf function Seek to a function
pdb Print disassembly of the basic block
Reverse engineering 32 and 64 bits binaries with Radare2 - 1
Older post

Embedded systems programming with atmel328 - 4 (temperature&humidity)

Newer post

Embedded systems programming with atmel328 - 5

Reverse engineering 32 and 64 bits binaries with Radare2 - 1