Dec

2011

One of the parts of the PHDays Quals was the ‘meteorite rain’ archive, containing many small and not so small challenges. One of these (M100) was tougher than most and quite interesting, so we decided to do a writeup.

The file M100 is a Windows console program written in C++. This means it’s a bit of a pain to reverse engineer. One of my teammates did the reverse engineering but then got stuck, so he asked if I could take a look at it. Basically the program looked like this when translated to a simple C program:

#include <stdio.h> int doround(int data, int key) { int i, result, outbit; result = 0; for (i = 0; i < 8; i++) { if ( key & 1 ) result ^= data; outbit = data & 0x80000000; data <<= 1; if ( outbit ) data ^= 0x1Bu; key >>= 1; } return result; } void doit(int* inbuf, int* outbuf) { int a = inbuf[0], b = inbuf[1], c = inbuf[2], d = inbuf[3]; outbuf[0] = doround(a, 14) ^ doround(d, 9) ^ doround(c, 13) ^ doround(b, 11); outbuf[1] = doround(b, 14) ^ doround(a, 9) ^ doround(d, 13) ^ doround(c, 11); outbuf[2] = doround(c, 14) ^ doround(b, 9) ^ doround(a, 13) ^ doround(d, 11); outbuf[3] = doround(d, 14) ^ doround(c, 9) ^ doround(b, 13) ^ doround(a, 11); } void main() { int inbuf[4]; int outbuf[4]; FILE* fp = fopen("ctf-pass-file.txt","r"); fread(inbuf, sizeof(int), sizeof(inbuf) / sizeof(int), fp); fclose(fp); doit(inbuf, outbuf); if (memcmp("Ctf2011 8d542404", outbuf, sizeof(outbuf))) { printf("Fail!!\n"); } else { printf("Win!!!\n"); // calculate a flag based on inbuf, and print it } }

The goal is to calculate an input that will make doit() produce the output “Ctf2011 8d542404”, and then it will produce a valid flag from your input.

The problem is that all of the output integers depend on all the input integers, so there’s no clear place to start solving this. Bruteforcing 4 * 32 = 128 bits is out of the question of course!

With no good ideas so far I first rewrote the algorithm in my favorite language, Python. I then started simplifying the algorithm to make sure I understand it.

The first thing to notice is that the operations involving ‘data’ are just implementing a LFSR (Linear Feedback Shift Register). The second is even though the loop goes for 8 iterations, the last 4 iterations never change the output because doround() is never called with a key argument that has bits higher than the 4th bit set (14 = 0b1110, 9 = 0b1001, 13 = 0b1101, 11 = 0b1011).

Now, if we have the value of ‘out’ and the value of ‘data’ at the end of doround() we can calculate the inputs, because the sequence generated by a LFSR can be reversed. Unfortunately we don’t have the final state of ‘data’ and only have the value of ‘out’ xor’ed with three other numbers, so this does not seem to be much help.

def doround(data, key): out = 0 for i in range(4): if key & (1 << i): out ^= data outbit = data >> 31 data = ((data << 1) ^ (outbit * 0x1b)) & 0xffffffff return out with open('ctf-pass-file.txt') as f: result = f.read(16) import struct a,b,c,d = struct.unpack('IIII', result) r1 = doround(a, 14) ^ doround(d, 9) ^ doround(c, 13) ^ doround(b, 11) r2 = doround(b, 14) ^ doround(a, 9) ^ doround(d, 13) ^ doround(c, 11) r3 = doround(c, 14) ^ doround(b, 9) ^ doround(a, 13) ^ doround(d, 11) r4 = doround(d, 14) ^ doround(c, 9) ^ doround(b, 13) ^ doround(a, 11) print repr(struct.pack('IIII', r1, r2, r3, r4))

It was about at this point that it starts to be apparent that the output bits (which are compared against the string “Ctf2011 8d542404”) are each just the xor of a selection of the input bits (the data read from ctf-pass-file.txt). The only part where it’s not completely clear is the LFSR code, and that can be rewritten as follows (since 0x1b = 0b11011):

data = ((data << 1) ^ outbit ^ (outbit << 1) ^ (outbit << 3) ^ (outbit << 4)) & 0xffffffff

So it is possible to write 128 equations expressing each output bit as the xor of some of the input bits. By solving this system of equations you can get the correct input bits! Solving such systems of equations is quite easy using Gaussian Elimination, so the tricky part is just getting these equations.

Doing this by hand is a recipe for disaster. So, let’s rewrite the program to do the same thing as before but instead of working with 32-bit integers we’ll use lists containing 32 sets. For every bit there is a set which tracks which input bits it is the xor of.

The operations on these values also have to be adjusted slightly:

- Instead of shifting an integer we create a new list with the sets in their new positions.
- Instead of xor’ing two integers we xor the sets for every bit. This is correct because the output bit is the xor of all the inputs in the two sets of the operands. If an input bit is in both operands, the term cancels out because a ^ b ^ b == a.

When you run the program it will perform all the calculations as before, but instead of having to provide input integers and getting output integers, you provide nothing and get a list of 128 sets which represent the input bits that are xor’ed together to form the outputs. You can combine this with the wanted value of the outputs (the 128 bits from the value “Ctf2011 8d542404”) to form 128 equations which can be solved using Gaussian Elimination.

The following code implements this strategy. In this implementation the sets previously mentioned are implemented by using a 128-bit integer as a bitfield.

As an example, if you had a 32-bit integer variable n in the original version, in this version you’ll have a list of 32 128-bit integers. If for example the first item in this list is 0b101, it means the first bit of n is the xor of the first and third bits from the input.

import operator # convenience function to xor an arbitrary number of values in our fancy format # all values must have the same number of bits or they get truncated def x(*args): return [reduce(operator.xor, t) for t in zip(*args)] def doround(data, key): out = [0] * len(data) for i in range(4): if key & (1 << i): out = x(out, data) # the bit which is shifted out outbit = data[31] # left shift: new lsb depends on none of the input bits, so 0 data = [0] + data[:31] # xor data with outbit # note that it's correct to always do this: # if the value of outbit is 0, then this xor does nothing data = x(data, [outbit, outbit, 0, outbit, outbit] + [0] * (32-5)) return out # each bit from the input depends only on itself input_bits = [1 << i for i in range(128)] a = input_bits[ 0:32] b = input_bits[32:64] c = input_bits[64:96] d = input_bits[96:128] # note: '+' is list append here. this is forming one 128-bit value from 4 32-bit values. output_bits = x(doround(a, 14), doround(d, 9), doround(c, 13), doround(b, 11)) \ + x(doround(b, 14), doround(a, 9), doround(d, 13), doround(c, 11)) \ + x(doround(c, 14), doround(b, 9), doround(a, 13), doround(d, 11)) \ + x(doround(d, 14), doround(c, 9), doround(b, 13), doround(a, 11)) import struct words = struct.unpack('IIII', 'Ctf2011 8d542404') # 128-bit desired output (an actual integer) output_val = sum(w << (32*i) for i,w in enumerate(words)) # append a bit representing the desired result to each 128-bit integer to form 128 equations. # each equation consists of 128 bits saying whether that input term is included in the xor sum, # and then one bit representing the result. equations = [(b << 1) | ((output_val >> i) & 1) for i,b in enumerate(output_bits)] def printeq(caption): print "\n%s\n" % caption for i, eq in enumerate(equations): eqstr = ''.join(str((eq >> i) & 1) for i in range(128+1)) print "%3d: %s = %s" % (i, eqstr[1:], eqstr[0]) printeq("Equations in input order") # perform gaussian elimination for i in range(len(equations)): equations.sort(reverse=True) for j in range(i+1, len(equations)): if equations[j].bit_length() < equations[i].bit_length(): break equations[j] ^= equations[i] printeq("Equations after elimination") # calculate the xor of all the bits in an integer def parity(n): return bin(n).count('1') & 1 # calculate the proper input. view output of program to help understand this. # remember upper 128 bits of the equation are input coefficients and low bit is desired result. result = 0 for i, eq in enumerate(reversed(equations)): # result for bit 0 is in eqn 127 result |= (parity(result & (eq >> 1)) ^ (eq & 1)) << i # result now contains the desired 128-bit value. store it as 4 32-bit words with the # proper endianness. result = struct.pack('IIII', *((result >> (32*i)) & 0xffffffff for i in range(4))) print "\nResult:\n" print result.encode('hex') print

And then when you are feeling quite pleased with yourself, you can submit the resulting flag for the half point (!) it was worth. π

Perhaps PHD CTF organisers know some much easier way to solve this?

“Perhaps PHD CTF organisers know some much easier way to solve this?”

I know ..

(Edited by admin: fixed double post, inserted code from original pastebin)

What the… I’m going to have bad dreams of LFSRs tonight π

Looking at the behavior and duplicating it with a period of 2 instead of 4 I can kind of understand this now, but is there any document on this kind of construction? Seems there is a lot I don’t know there.

omg xDD cyclic

Nice approach!

My solution if any interested π

I agree, half points is too small for the task =