13
Jan
2013

GitS 2013 Teaser – What the hell is Keming?

Find key the.

We start off with a xz-compressed file:

$ file 35e25782a7b3b88409e58756e63c40c2.bin 
35e25782a7b3b88409e58756e63c40c2.bin: XZ compressed data
$ xz -dc 35e25782a7b3b88409e58756e63c40c2.bin > data

At first, the resulting file looks like an ordinary gzip:

$ file data 
data: gzip compressed data, was "43", from Unix, last modified: Fri Jan 11 00:54:03 2013

Closer inspection reveals that we’re dealing actually with multiple (100) gzip streams concatenated:

$ binwalk data
DECIMAL   	HEX       	DESCRIPTION
-------------------------------------------------------------------------------------------------------
0         	0x0       	gzip compressed data, was "43", from Unix, last modified: Fri Jan 11 00:54:03 2013
956       	0x3BC     	gzip compressed data, was "38", from Unix, last modified: Fri Jan 11 00:54:03 2013
1912      	0x778     	gzip compressed data, was "45", from Unix, last modified: Fri Jan 11 00:54:03 2013
2868      	0xB34     	gzip compressed data, was "16", from Unix, last modified: Fri Jan 11 00:54:03 2013

Caveat: using gzip -d to decompress the file results in all streams being output to a single file named ’43’.

As the original file names suggest there’s an ordering, we’d rather extract each stream to its own file. A small bit of Python helps achieving this:

#!/usr/bin/python2
import re, sys

data = open(sys.argv[1]).read()
gzip_offsets = [m.start() for m in re.finditer('\x1f\x8b', data)]

for i in xrange(len(gzip_offsets)-1):
    start, end = gzip_offsets[i:i+2]

    open('%03d.gz' % i, 'w').write(data[start:end])

Now, uncompress all individual files to a separate directory, while preserving original filenames (gzip -dN *.gz). Clean up the leftover gzips and concatenate the uncompressed files (rm *.gz; cat * > concat.bin).

$ file concat.bin 
concat.bin: POSIX tar archive (GNU)
$ tar xvf concat.bin
keming/
keming/index.html
keming/pronoun.woff
keming/preposition.woff
keming/adjective.woff
keming/interjection.woff

The tar contains a single HTML file referencing fonts. The HTML file contains excerpts from (copyright-free) books with some formatting, but it doesn’t look like a key in any way.

By simply using a hexeditor/hexdump to examine the files, we observe that one of the files contains a suspicious string (GitS), near the start of the file.

$ hexdump -n64 -C interjection.woff 
00000000  77 4f 46 46 00 01 00 00  00 00 46 98 00 10 00 00  |wOFF......F.....|
00000010  00 00 5b fc 00 00 00 00  00 00 00 00 00 00 00 00  |..[.............|
00000020  00 00 00 00 00 00 00 00  00 00 00 00 47 69 74 53  |............GitS|
00000030  00 00 01 6c 00 00 00 64  00 00 00 85 3f e5 2f 23  |...l...d....?./#|

In order to decipher the significance of GitS, we refer to the specification, which specifies a 44-byte header followed by the WOFF table directory. Again, we use python to parse the table directory:

#!/usr/bin/python2
import struct, sys, zlib

WOFF_HEADER_SIZE = 44
WOFF_TD_ENTRY_SIZE = 20

def parse_table(fp, num):
    for i in xrange(num):
        tag, offset, comp_len, orig_len, checksum \
            = struct.unpack('>4s4I', fp.read(WOFF_TD_ENTRY_SIZE))

        print 'magic:%s offset:%d comp_len:%d orig_len:%d checksum:%08x' % (tag, offset, comp_len, orig_len, checksum)

fp = open(sys.argv[1])
header = fp.read(WOFF_HEADER_SIZE)

# Retrieve signature/flavor/length/number of WOFF table directory entries
sig, flavor, length, num_tables = struct.unpack('>IIIH', header[:14])

# Parse table directory
fp.seek(WOFF_HEADER_SIZE)
parse_table(fp, num_tables)
$ parse.py interjection.woff 
magic:GitS offset:364 comp_len:100 orig_len:133 checksum:3fe52f23

Referring back to the specification, we learn that ‘GitS’ is a zlib stream at absolute offset 364 of length 100. Some shell-fu to wrap this up:

$ dd if=interjection.woff skip=364 bs=1 count=100 2>/dev/null | zlib-flate -uncompress
key{7351c2a3c100cf0168e1e9fb842e5938cbfb34bb8175c6d0075bbf50c0388576f4a1de8c485408a9d43602aabc2e7d217f78eea2aa01077fcb5a99ba390a21c8}

Looking back at it, quite a simple task. Unfortunately I (embarrassingly) spent quite some time attempting to stitch a tar file back together, as only later I learnt that gzip will happily decompress multiple concatenated streams – to a single file, ignoring any filenames present in all streams but the first. Takeaway message: never trust gzip.

Comments are closed.