Wednesday, October 15, 2014

Reversing C++ binaries 1: name mangling and global/static functions

Binary reversing is an essential skill for malware analysis and solving wargames challenges. Program written in C are common and there are various tutorial about their reversing (calling conventions, dynamic libraries, stack, variables and so on). Once the assembly language is learned, it's just a matter of patience to reverse an application (anti-reversing techniques aside, of course).

However the assembly generated from C++ code is harder to analyze, due to object-oriented constructs. These tutorials aim to study how high-levels constructs, such as namespaces, operators, classes and their relationships, are converted into assembly code and how to reverse them when analyzing a binary.

PART 1: NAME MANGLING AND FUNCTIONS

First of all, functions memory addresses are renamed with a name suitable for the compiler and the linker. This process is called name mangling (see below for references).

namespace
{
// _ZN12_GLOBAL__N_17ScroogeEv
int Scrooge()
{
    return 5;
}
}

// _Z11GlobalPlutov
int GlobalPluto()
{
    return 4;
}

// _ZL11GoofyStaticv
static int GoofyStatic()
{
    return 3;
}

namespace Donald
{
// _ZN6Donald12GlobalDonaldEv
int GlobalDonald()
{
    return 1;
}
// _ZN6DonaldL12StaticDonaldEv
int StaticDonald()
{
    return 2;
}
}

Thus by reading the following code:

push   %ebp
mov    %esp,%ebp
lea    -0x1018(%esp),%esp
orl    $0x0,(%esp)
lea    0x1010(%esp),%esp
call   80487fd <_ZL11GoofyStaticv>
call   80487c1 <_Z11GlobalPlutov>
call   8048749 <_ZN6Donald12GlobalDonaldEv>
call   8048785 <_ZN6DonaldL12StaticDonaldEv>
call   804870d <_ZN12_GLOBAL__N_17ScroogeEv>
mov    $0x0,%eax
leave
ret

We can say that the function calls two function without namespace called GoofyStatic and GlobalPluto, two function inside the 'Donald' namespace and finally a function residing in the global namespace. Finally, GDB offers an automatic demangling utility:

gdb> set print asm-demangle on
gdb> disass main
Dump of assembler code for function main:
   0x0804873f <+0>: push   %ebp
   0x08048740 <+1>: mov    %esp,%ebp
   0x08048742 <+3>: and    $0xfffffff0,%esp
   0x08048745 <+6>: lea    -0x1010(%esp),%esp
   0x0804874c <+13>: orl    $0x0,(%esp)
   0x08048750 <+17>: lea    0x1010(%esp),%esp
   0x08048757 <+24>: call   0x8048659 <main2()>
   0x0804875c <+29>: call   0x80485e9 <Donald::GlobalDonald()>
   0x08048761 <+34>: call   0x8048707 <_ZN6DonaldL12StaticDonaldEv>
   0x08048766 <+39>: call   0x8048621 <GlobalPluto()>
   0x0804876b <+44>: call   0x8048723 <_ZL11GoofyStaticv>
   0x08048770 <+49>: mov    $0x0,%eax
   0x08048775 <+54>: leave
   0x08048776 <+55>: ret

References:
http://www.int0x80.gr/papers/name_mangling.pdf
http://www.ofb.net/gnu/gcc/gxxint_15.html
http://en.wikipedia.org/wiki/Name_mangling#Name_mangling_in_C.2B.2B
http://stackoverflow.com/a/1962381

No comments: