In today's post we will try to understand why the output of this

``````int main(void) {
cout << "The secret is: " << answer << endl;
}
``````

program is

``````The answer is: 42
``````

First of all we need to note first that, of course the answer is 42. What were you thinking guys. The second not so obvious question is, why in the flying f#c4 does this even works?

``````((Complex *)0)->answer()
``````

Lets first decrypt this. Here we are

• Picking zero
• Casting it to a pointer to a complex type
• Invoking a method called answer().

Obvious question. Why does this work? Shouldn't this explode in a huge null pointer exception fireball?

Third question? How did we even get to do something as odd like this?

Lets start by trying to answer the last question. With an empty program

``````#include "../src/bmath/math.hpp"
#include "../src/bmath/complex.hpp"

#include <iostream>

using namespace BMath;
using namespace std;

int main(void) {
}
``````

This programs does pretty much nothing. However it includes two header files. In particular one which as the names suggest is doing complex numbers arithmetic.

``````namespace BMath {
class Complex {
private:
Double real;
Double imaginary;

public:
Complex(Double real, Double imaginary);
Complex(const Complex &other);
Complex();
Complex &operator+=(const Complex &rightOp);
Complex operator+(const Complex &rightOp);
Complex operator-(const Complex &rigthOp);
Complex operator*(const Complex &rightOp);

// Overload to enable toString operations
friend std::ostream &operator<<(std::ostream &stream,
BMath::Complex const &c) {
return stream << "(" << c.real << "+" << c.imaginary << "i)";
}
};
``````

By executing

``````readelf -a bin/Debug/maths | grep Complex
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexpLERKS0_
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexC1ERKS0_
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexC2Ev
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexC1Edd
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexC1Ev
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexplERKS0_
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexC2Edd
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexmiERKS0_
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexC2ERKS0_
FUNC    GLOBAL DEFAULT   16 _ZN5BMath7ComplexmlERKS0_
``````

we can verify that the Complex class was compiled and included in the binary. The journey started when I found something odd during machine code analysis. I was looking into the machine code program with radare2 and found something odd on the machine code generated for these three methods

``````//BMath::Complex::Complex(BMath::Complex const&)
94   0x000012b4 GLOBAL FUNC   53       BMath::Complex::Complex(BMath::Complex const&)
106  0x0000128c GLOBAL FUNC   40       BMath::Complex::Complex()
116  0x00001258 GLOBAL FUNC   52       BMath::Complex::Complex(double, double)
``````

I found the following machine code being generated

``````// BMath::Complex::Complex(BMath::Complex const&)
49: fcn.000012b8 ();
; var int64_t var_10h @ rbp-0x10
; var int64_t var_8h @ rbp-0x8
0x000012b8      55             push rbp
0x000012b9      4889e5         mov rbp, rsp
0x000012bc      48897df8       mov qword [var_8h], rdi
0x000012c0      488975f0       mov qword [var_10h], rsi
0x000012c4      488b45f0       mov rax, qword [var_10h]

// BMath::Complex::Complex()
36: fcn.00001290 ();
; var int64_t var_8h @ rbp-0x8
0x00001290      55             push rbp
0x00001291      4889e5         mov rbp, rsp
0x00001294      48897df8       mov qword [var_8h], rdi
0x00001298      488b45f8       mov rax, qwrd [var_8h]

//BMath::Complex::Complex(double, double)
48: fcn.0000125c ();
; var int64_t var_18h @ rbp-0x18
; var int64_t var_10h @ rbp-0x10
; var int64_t var_8h @ rbp-0x8
0x0000125c      55             push rbp
0x0000125d      4889e5         mov rbp, rsp
0x00001260      48897df8       mov qword [var_8h], rdi
0x00001264      f20f1145f0     movsd qword [var_10h], xmm0
0x00001269      f20f114de8     movsd qword [var_18h], xmm1
``````

If you pay attention to the function signatures something does not add.

``````//The pair of functions are the same
BMath::Complex::Complex(BMath::Complex const&)
fcn.000012b8 (var int64_t var_8h, var int64_t var_10h)

BMath::Complex::Complex()
fcn.00001290 (var int64_t var_8h @ rbp-0x8)

BMath::Complex::Complex(double, double)
fcn.0000125c (var int64_t var_8h @ rbp-0x8,  var int64_t var_10h @ rbp-0x10,var int64_t var_18h @ rbp-0x18);
``````

The machine code defines functions that take always more one argument than the C++ signatures we define in our program.

Now lets get back again. Lets see how does the machine code looks like when we use these functions. Lets start with the simple case of an empty main function

``````int main(void) {
}
``````

This generated the following machine code

``````fcn.000014a0 ();
0x000014a0      55             push rbp //Saves the previous stack frame
0x000014a1      4889e5         mov rbp, rsp // Assigns rsp as the new stack frame by moving to rbp
0x000014a4      b800000000     mov eax, 0  //put 0 on eax register to be returned by the program
0x000014a9      5d             pop rbp // loads previous stack frame
0x000014aa      c3             ret //returns from the function
``````

Now lets see what gets generated when we allocate a Complex instance on the stack and just return the memory address

``````int main(void) {
Complex c;
return &c;
}
``````

The machine code looks like

``````fcn.0000118d ();
; var int64_t var_20h @ rbp-0x20
; var int64_t var_8h @ rbp-0x8
0x0000118d      55             push rbp
0x0000118e      4889e5         mov rbp, rsp
0x00001191      4883ec20       sub rsp, 0x20
0x00001195      64488b042528.  mov rax, qword fs:[0x28]
0x0000119e      488945f8       mov qword [var_8h], rax
0x000011a2      31c0           xor eax, eax
0x000011a4      488d45e0       lea rax, qword [var_20h]
0x000011a8      4889c7         mov rdi, rax
0x000011ab      e8b4000000     call method BMath::Complex::Complex() ; method.BMath::Complex.Complex
0x000011b0      488d45e0       lea rax, qword [var_20h]
0x000011b4      488b55f8       mov rdx, qword [var_8h]
0x000011b8      644833142528.  xor rdx, qword fs:[0x28]
┌─< 0x000011c1      7405           je 0x11c8
│   0x000011c3      e8b8feffff     call sym.imp.__stack_chk_fail
│   ; CODE XREF from fcn.0000118d @ 0x11c1
└─> 0x000011c8      c9             leave
0x000011c9      c3             ret
``````

Ok. It seems that the compiler went full on drugs. Why on the heck does he generates this amount of machine code for just two lines of code?

Lets pick this step by step. First just note that the first three and the last two just form the prologue and epilogue. So we can forget them. Lets jump to

``````           0x00001195      64488b042528.  mov rax, qword fs:[0x28]
0x0000119e      488945f8       mov qword [var_8h], rax
...
0x000011b4      488b55f8       mov rdx, qword [var_8h]
0x000011b8      644833142528.  xor rdx, qword fs:[0x28]
┌─< 0x000011c1      7405           je 0x11c8
│   0x000011c3      e8b8feffff     call sym.imp.__stack_chk_fail
│   ; CODE XREF from fcn.0000118d @ 0x11c1
└─> 0x000011c8      c9             leave
``````

Here we look into the next two instructions and then jump to a set of instructions in the end. Why? Because this is code generated by the compiler to prevent exploits based on address space layout randomization. This is a mechanism provided by linux kernel. You can check if you got it enabled by checking the value under /proc/sys/kernel/randomize_va_space. The previous code is doing the following

``````//reads from the fs segment controlled by the kernel a piece of data  into the rax register
0x000011b4      488b55f8       mov rdx, qword [var_8h]
0x00001195      64488b042528.  mov rax, qword fs:[0x28]
//stores the register data on the memory at [var_8h]
0x0000119e      488945f8       mov qword [var_8h], rax
``````

Then in the end

``````// Loads previous kernel value stored at [var_8h] into rdx register
0x0000119e      488945f8       mov qword [var_8h], rdx
// Compares previous kernel value with the last one
0x000011b8      644833142528.  xor rdx, qword fs:[0x28]
// If they do not match invoke sym.imp.__stack_chk_fail
// otherwise leave the program
┌─< 0x000011c1      7405           je 0x11c8
│   0x000011c3      e8b8feffff     call sym.imp.__stack_chk_fail
│   ; CODE XREF from fcn.0000118d @ 0x11c1
└─> 0x000011c8      c9             leave
``````

Ok but what is this cryptic sym.imp._stackchk_fail. Well this is a function defined in the linux kernel that is responsible to signal anytime if finds corruption on the stack.

Ok we are almost done. Now we just need to understand the following

``````           0x000011a4      488d45e0       lea rax, qword [var_20h]
0x000011a8      4889c7         mov rdi, rax
0x000011ab      e8b4000000     call method BMath::Complex::Complex() ; method.BMath::Complex.Complex
0x000011b0      488d45e0       lea rax, qword [var_20h]
``````

Here this is also pretty straightforward. So we are using var_20h as the memory location to store the Complex c value. This is done by assigning the stack memory location var_20h which is actually rbp-0x20 to rdi. rdi will then be used by BMath::Complex::Complex().

Ok but how do we know that var_20h is actually holding &c. Well we know that the return value of the program is stored in rax and if you notice the last time rax appears in the program is on

``````0x000011b0      488d45e0       lea rax, qword [var_20h]
``````

So what in the hell did we conclude here? We conclude that the machine code generated for the Complex c++ class is of the form.

``````BMath::Complex::Complex(BMath::Complex const&)
fcn.000012b8 (this, BMath::Complex const&)

BMath::Complex::Complex()
fcn.00001290 (this)

BMath::Complex::Complex(double, double)
fcn.0000125c (this,double,double);
``````

This is actually revealing. The point here is that there is no concept as such a method belonging to a class like in many other high level languages like java. Here the methods are simple functions and the class concept is build with trickery like name mangling and by passing the object reference around based on this approach.

What do we conclude then? Well we conclude first that the methods are functions and they do not depend on the instances they only depend somehow on the type of the class. Also we conclude that

``````c->hello(argument)
``````

really means

``````class_type=get type of c
f=find_function(class_type,hello)
f(c,argument)
``````

Is this important? Well not really but will explain why this can work

``````((Complex *)0)->answer()

//First we force a type by casting 0 into a pointer of type Complex
((Complex *)0)
``````

But why does this works? Well because actually behind the hoods that is equivalent to

``````answer(0)
``````

and if you look into the definition

``````Int BMath::Complex::answer() { return 42; }
``````

Nowhere we use the pointer (in this case 0). So this call will work fine.

But what if instead we got this definition

``````Int BMath::Complex::answer() { return this->real; }
``````

In this case when we run the program we got

``````bin/Debug/maths
[1]    38696 segmentation fault (core dumped)  bin/Debug/maths
``````

Why? Well because now by invoking this->real we are actually trying to access 0x0 memory address which is, obviously, a memory access violation that ends unsurprisingly in a segmentation fault.

Image