Wednesday, April 29, 2009

BUFFER OVERFLOW: WHEN THE DATA BECOMES INSTRUCTIONS

Buffer overflow is on of the oldest computer security problem. Beginning from Morris's Worm this type of remote exploitation is one of the most common and dangerous. If you look into last security holes found by big security research companies such as BugTraq or Packet Storm, you will see that minimum about 1/3 of them are buffer overflows. Buffer overflow is actual on ALL operation systems.

Let's demonstrate the buffer overflow on a small example. All code in this article written in C and must be compiled under Windows (I am using Microsoft Visual C++ 5.0).

==========================================

/* bo.exe */

#include
int oveflowing_func(char *big)
{
char vulnerable_buffer[100]; // overflowing buffer [100 bytes]
strcpy(vulnerable_buffer,big); // copy big_buffer to vulnerable_buffer
return 0; // leave function
}

int main (int argc, char *argv[])
{
char big_buffer [1024]; // buffer for keyboard input [1024 bytes]

gets(big_buffer); // entering the string from keyboard
oveflowing_func(big_buffer); // call valurnable function
return 0;
}

======================================

First the user enters the string [big_buffer], when we call [oveflowing_func] with [big_buffer] as param and in oveflowing_func we copy [big_buffer] to [vulnerable_buffer]. The main idea is that [vulnerable_buffer] is smaller than [big_buffer]. If the [big_buffer] will be bigger than 100 symbols/bytes the [vulnerable_buffer] will be overflowed and we can exploit it. Now let's compile this program as Win32 console application. So let's run it and enter the string more than 100 symbols long:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

... in the result we get abnormal program termination and the following error message:

Exception: access violation. Address: 0x61616161

Let's see what happened. When the procedure/function is called, the program puts return address in stack. The return address points to the next command after the function call , in our example: return 0. Also in stack function keeps it's local variables. So at the moment before overflow (strcpy(valurnable_buffer,big)) stack has the following structure:

___________________
| |
_|___________________|
| | |
| | | *************************
100 bytes = | | vulnerable_buffer | !!! stack grows down !!!
| | | *************************
| |___________________ |_
| | |
| return address | | = 4 bytes
|___________________ | |
|____________________|

It's clear that strcpy(valurnable_buffer,big) with [big] large than [valurnable_buffer] will cause the change of return address.

We managed to give control to address placed in our string. Now we can create a string for input, containing a small code (processor's instructions) that will do everything we need, and by changing the return address give control to it. We have th e exploit.

At the beginning we need to know which position in the string will be placed in the return address. Using only our listing we cannot find these positions. There is two methods of finding it. We can disassembly our program, for example with Interactive Disassembler, but we need to know exactly in what function the overflow is. And there is experimental way to do it. We will use it. At first, we need to form a string with all symbols with ASCII codes from 32 to 255

=========================================
/* ascii.exe */

#include
void main(void)
{
int i;
for (i=32;i<256;i++) printf("%c",i);
}

===========================================

Pay attention to one thing: the exploit string must NOT contain the end-of-line symbols:NULL(0x00), LF(0x0a), CR(0x0c), EOF(0x1a). In case that one of this symbols will be in string the strcpy function will copy only a part of our string until these s ymbols. That's why we use codes from 32-255.

Compile and run:

c:\bo.exe | ascii.exe

And we have:

Exception: access violation. Address: 0x8b8a8988

Address 0x8b8a8988 (remember about reversed order in the machine word: from right to left:0x8b8a8988 - correct, 0x88898a8b - wrong) means that begging from 0x88 - 0x20(32) = 104th (the count begins from 0 but our string starts at 32) our string will intercept with the return address. That means - we need to form such a string that symbols in positions 104th, 105th, 106ht and 107th (size of return address is 4 bytes - machine word) will contain the address to give control to.

Now we need to decide how we will form the code. There is 2 possibilities: from the beginning to 104th and from 108th. The first method gives us only 104 bytes for code that's why we choose the second one. The string before 104th position we can fill with code of 'NOP' (No OPerand) 0x90.

Next we need to determine what address we will place instead of the return address. Let's look into registers and memory after the instruction 'RET' (= 'return' in C). Let's look into stack again:

___________________
| |
0|___________________|
| |
| | *************************
| vulnerable_buffer | !!! stack grows down !!!
| | *************************
104|___________________|
| |
| return address |
108|___________________|
|___________________| <- ESP

On the left there are positions in string. As you can see after the 'RET' command the ESP register will point to 108th position in our string. So all we need is to execute instruction: jmp esp. In order to do it we should find in memory the combination of 2 bytes: 0xff 0xe4 (jmp esp). The address of this 2 bytes will be new return address. So we have the following execution order:

RET -> JMP ESP -> exploit code

This combination can be found in the our program memory or in DLL memory area. The best variant is the first one or DLL that is connected to the program.The image of DLL in memory is placed beginning from Image base. We can easily found it using any utility analysing PE headers.

D:\exploit>listdlls bo
. . .
Base Size Version Path
0x00400000 0x27000 C:\bo.exe
0x77f60000 0x5c000 4.00.1381.0130 D:\WINNT\System32\ntdll.dll
0x77f00000 0x5e000 4.00.1381.0133 D:\WINNT\system32\KERNEL32.dll

Here Base - is Image base of the program. Now we have information about placement of program and DLLs in memory. We can search for these 2 bytes (jmp esp) in the executable file(*) or in memory(**). The last one looks more comfortable for me as we don't need to count the offset in file.
Let's use a debugger, for example Soft ICE for Windows. Run Soft ICE service and choose bo.exe and making a buffer overflow as we have done it earlier. When the exception will be raised - we will see the Soft ICE console. Let's search for our combination, type this command: s 1000000 l ffffffffff e4 - it means search for byte combination 0xff 0xe4 beginning with address 0x0100000(it's the first address that doesn't contain highest zero byte) to 0xFFFFFFFF, In result we have:

Pattern found at 0023:77f327e5 (77f327e5)


The combination was found at 0x77f327e5. Well done ! This address doesn't contain the end-of-line codes: 0x00, 0x0a, 0x0c, 0x1a, so we can easily use it. This instruction is inside KERNEL32.DLL (range from 0x77f00000-0x77f5e000) and depends on the service pack installed - probably this address will differ on your machine.

If you search it in file you will see that the bo.exe will be placed in the range 0x77f00000 : 0x77f5e000 - that means the highest byte is always zero - we cannot use these addresses.

KERNEL32.DLL is in range 0x77f00000 : 0x77f5e000 - that means: use any hex viewer and search for 0xff 0xe4 - than we add to the found offset the Image base and we will have the same result: 0x77f327e5.

Now you have everything you need: you know where to place binary code, you know what value to place instead the return address. That's all. Everything you need is to create an exploit.