26 March 2022

dreamland writeup - rev - zer0pts ctf 2022

by m4gnum

intro

dreamland is a reversing task from zer0pts CTF 2022. It was solved by 11 teams, and received a scoring of 266/500 points. In this writeup, I’ll share my solution & thought process.

In the challenge, we’re given two files - encryptor and flag.txt.enc. Looks like encryptor is a 64-bit ELF file, and flag.txt.enc is just a blob of data:

$ file ./encryptor
./encryptor: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=9f14fbf9a2d25769a71885bde207d3d675f7360f, for GNU/Linux 3.2.0, not stripped
$ xxd flag.txt.enc
00000000: 954d f763 fa38 aa68 39bb 8210 985f bf6b  .M.c.8.h9...._.k
00000010: 81b5 f0f7 7806 da24 0a27 e984 c0b1 045b  ....x..$.'.....[
00000020: f018 4a00 c208 a21a 77cb cb1e d4bc d754  ..J.....w......T
00000030: 48ad 59f8 884a bc1b bcc5 a790 90bb 75cd  H.Y..J........u.
00000040: b4a2 020c 4d06 d1e6 cef2 61a9 5dd6 e708  ....M.....a.]...
00000050: 05                                       .

It’s pretty obvious now that our goal is probably to decrypt flag.txt.enc by finding out how encryptor works and writing a script that does the opposite.

startup black-box analysis

Let’s run encryptor and see what happens:

$ ./encryptor
Usage: ./encryptor <flag file>

Seems like it expects a flag file. Let’s create one and pass it now:

$ echo zer0pts{myflag} > fake.txt
$ ./encryptor fake.txt

We’re given no prompt, but looks like a file called fake.txt.enc was created:

$ ls -la fake.txt*
-rw-rw-r-- 1 omerk omerk 16 Mar 26 13:19 fake.txt
-rw-rw-r-- 1 omerk omerk 42 Mar 26 13:19 fake.txt.enc
$ xxd fake.txt.enc
00000000: 5287 e543 968d 1c34 8e44 81cb ecdb c49a  R..C...4.D......
00000010: 49e3 8906 d784 cf08 b99e 4d69 0008 691c  I.........Mi..i.
00000020: 59bb 4afc 5a82 fe29 3c08                 Y.J.Z..)<.

As we can see, fake.txt.enc is of length 42, while fake.txt is of length 16. Seems like extra 26 bytes were appended / prepended to the encrypted blob, which probably can be used to recover the original data. Let’s run it now with strace and see what else can we learn about the binary:

$ strace ./encryptor fake.txt
[...]
openat(AT_FDCWD, "fake.txt", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=16, ...}) = 0
read(3, "zer0pts{myflag}\n", 4096)      = 16
read(3, "", 4096)                       = 0
openat(AT_FDCWD, "fake.txt", O_RDONLY)  = 4
openat(AT_FDCWD, "fake.txt.enc", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 5
getrandom("\xf6\x83\xdc\x0b\xa3\xe6\x7e\x87\xc6\x17", 10, 0) = 10
getrandom("\x81\xa8\x04\x8b\x7f\xdd\x15\xc6", 8, 0) = 8
fstat(4, {st_mode=S_IFREG|0664, st_size=16, ...}) = 0
read(4, "zer0pts{myflag}\n", 4096)      = 16
fstat(5, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
read(4, "", 4096)                       = 0
close(4)                                = 0
write(5, "C\277\3\261B\t\370w\320\1h\322\212rO\313\30\363\323\366\337\254\177VT={\263\5\236J\373"..., 42) = 42
close(5)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

There’s not a lot of interesting information here, but we can indeed see that the binary requests two random blobs - one of length 8 and another of length 10. These are probably used as encryption keys, but we’ll have to verify that later on. Except for that, everything is pretty much as we’ve already learned from looking at the result of the execution. With that in mind, we can dive in and start analyzing the binary in IDA.

in-depth white-box analysis

Let’s load this binary into IDA and take a look at the main function under the decompiler:

int __cdecl main(int argc, const char **argv, const char **envp)
{
  if ( argc > 1 )
  {
    check_ascii(argv[1]);
    encrypt_file(argv[1]);
    return 0;
  }
  else
  {
    printf("Usage: %s <flag file>\n", argv[0]);
    return 1;
  }
}

Nice, so we’ve got symbols that tell us the binary first validates the input data is a stream of ASCII characters, and then calls the encrypt_file function. Let’s take a look inside it (note, I modified the variable names to make stuff more readable):

void __fastcall encrypt_file(const char *plaintext_filename)
{
  // [...]
  plaintext_filename_len = strlen(plaintext_filename);
  ciphertext_filename = (char *)malloc(plaintext_filename_len + 5);
  sprintf(ciphertext_filename, "%s.enc", plaintext_filename);
  plaintext_stream = fopen(plaintext_filename, "rb");
  if ( plaintext_stream )
  {
    ciphertext_stream = fopen(ciphertext_filename, "wb");
    if ( ciphertext_stream )
    {
      free(ciphertext_filename);
      dream_descriptor = malloc(0x140uLL);
      if ( dream_descriptor )
      {
        getrandom(random_10, 10LL, 0LL);
        getrandom(random_8, 8LL, 0LL);
        initialize_dream(dream_descriptor, random_10, random_8);
        while ( !feof(plaintext_stream) && fread(&written_byte, 1uLL, 1uLL, plaintext_stream) == 1 )
        {
          dream = create_dream(dream_descriptor);
          written_byte ^= dream;
          fwrite(&written_byte, 1uLL, 1uLL, ciphertext_stream);
        }
        for ( i = 0; i <= 12; ++i )
        {
          written_byte = 0;
          for ( j = 0; j <= 7; ++j )
          {
            if ( 8 * i + j <= 99 )
              written_byte |= *((unsigned __int8 *)dream_descriptor + 8 * i + j + 120) << j;
          }
          fwrite(&written_byte, 1uLL, 1uLL, ciphertext_stream);
        }
        for ( k = 0; k <= 12; ++k )
        {
          written_byte = 0;
          for ( m = 0; m <= 7; ++m )
          {
            if ( 8 * k + m <= 99 )
              written_byte |= *((unsigned __int8 *)dream_descriptor + 8 * k + m + 220) << m;
          }
          fwrite(&written_byte, 1uLL, 1uLL, ciphertext_stream);
        }
        fclose(plaintext_stream);
        fclose(ciphertext_stream);
        free(dream_descriptor);
      }
    }
  }
}

As we can see, stuff matches our assumptions pretty well. The binary creates a new file whose called <original file>.enc, and writes the encrypted content into it. The encrypted content is built out of two sections - the first one is the original data XORed against a generated keystream, which is probably generated based on the two random buffers (which we have already spotted). It seems like the tracking of the keystream is done using a struct I called dream_descriptor (the create_dream was one of the symbols, so I decided to just stick with it), which is of size 0x140 bytes, and probably contains data related to generating the keystream. The second section of the encrypted file contents is built of 2 13-byte blocks, which are just a compressed form of bitmaps that the dream_descriptor holds. Each one of these bitmaps is 100-bit sized, so they are compressed into 13 bytes (the last 4 bits of the last byte are left unused). The first bitmap is at offset 120 of the dream_descriptor, and the second one is at offset 220. To sum up, that’s the structure of the encrypted file:

data	description	size
encrypted buffer	plaintext ^ keystream	N bytes
compressed bitmap 1	dream_descriptor + 120	13 bytes
compressed bitmap 2	dream_descriptor + 220	13 bytes

So, with this in mind, it seems like our way to go here is to understand how these bitmaps are constructed and how we can recover the keystream from them. Once we understand this algorithm, we’ll be able to take the bitmaps in flag.txt.enc, recover the keystream, and XOR the encrypted data against it - which will reveal the flag.

Let’s now understand how the keystream is generated, and for that, we’ll take a look at the function create_dream, which is used to construct each byte of the keystream, with which the binary XORs every byte of the original data:

__int64 __fastcall create_dream(__int64 dream_descriptor)
{
  // [...]
  idx = _setjmp(env);
  if ( idx <= 7 )
  {
    _nightmare((dream_descriptor + 120), (dream_descriptor + 220), 0, 0);
    longjmp(env, idx + 1);
  }
  return 0LL;
}

This looks pretty weird. The decompiler tells us that this function always returns 0. If that had been the case, we should have seen the flag in flag.txt.enc, as XORing with 0 is an “identity” operation. It’s more likely that the decompiler is lying here. Let’s read the man page of setjmp and longjmp to find out why this happens:

DESCRIPTION
       The  functions described on this page are used for performing "nonlo‐
       cal gotos": transferring execution from one function to  a  predeter‐
       mined  location  in  another function.  The setjmp() function dynami‐
       cally establishes the target to which control will  later  be  trans‐
       ferred, and longjmp() performs the transfer of execution.

       The setjmp() function saves various information about the calling en‐
       vironment (typically, the stack  pointer,  the  instruction  pointer,
       possibly  the  values  of other registers and the signal mask) in the
       buffer env for later use by longjmp().  In this  case,  setjmp()  re‐
       turns 0.

       The  longjmp() function uses the information saved in env to transfer
       control back to the point where setjmp() was called  and  to  restore
       ("rewind")  the  stack to its state at the time of the setjmp() call.
       In addition, and depending on the  implementation  (see  NOTES),  the
       values of some other registers and the process signal mask may be re‐
       stored to their state at the time of the setjmp() call.

       Following a successful longjmp(), execution continues as if  setjmp()
       had  returned  for  a second time.  This "fake" return can be distin‐
       guished from a true setjmp() call because the "fake"  return  returns
       the  value  provided in val.  If the programmer mistakenly passes the
       value 0 in val, the "fake" return will instead return 1.

So that’s pretty interesting. This pair of setjmp and longjmp calls allow us to save the execution context at a certain point of the program, and then jump back to it later on. The standard library manages everything to us - restoring the stack pointer, the registers, and even the signal mask. In addition, it allows us to choose a custom return value for setjmp that can help us identify the point from where we’ve returned to that context. Let’s now examine the function again, with this knowledge:

idx = _setjmp(env);
if ( idx <= 7 )
{
  _nightmare((dream_descriptor + 120), (dream_descriptor + 220), 0, 0);
  longjmp(env, idx + 1);
}

As we can see, this is used by the challenge writer for implementing loops! Instead of using a regular for-loop construct, the writer decided to make our lives harder by using these weird non-local goto constructs for implementing loops. This loop is actually identical to for (int idx = 0; idx <= 7; ++idx) { ... }. It seems like IDA fails to decompile this function well due to the usage of these constructs, since they lead the elimination of important code IDA assumes to be “dead”:

mov     rax, [rbp+var_E8]
movzx   edx, byte ptr [rax+78h]
mov     rax, [rbp+var_E8]
movzx   eax, byte ptr [rax+0DCh]
xor     eax, edx
movzx   edx, al
mov     eax, [rbp+var_D4]
mov     ecx, eax
shl     edx, cl
mov     eax, edx
mov     edx, eax
movzx   eax, [rbp+var_D5]
or      eax, edx
mov     [rbp+var_D5], al
mov     rax, [rbp+var_E8]
lea     rsi, [rax+0DCh]
mov     rax, [rbp+var_E8]
add     rax, 78h ; 'x'
mov     ecx, 0
mov     edx, 0
mov     rdi, rax
call    __nightmare
mov     eax, [rbp+var_D4]
lea     edx, [rax+1]
lea     rax, [rbp+env]
mov     esi, edx        ; val
mov     rdi, rax        ; env
call    _longjmp

We can fix this behavior of IDA by patching the function and replacing the usage of setjmp and longjmp with regular control-flow constructs.

First, we’ll have to replace the setjmp construct:

lea     rax, [rbp+env]
mov     rdi, rax        ; env
call    __setjmp
endbr64

With:

xor     eax, eax

loop_entry:
endbr64

Second, we’ll have to replace the longjmp construct:

lea     edx, [rax+1]
lea     rax, [rbp+env]
mov     esi, edx        ; val
mov     rdi, rax        ; env
call    _longjmp

With:

lea     eax, [rax+1]
jmp     loop_entry

Note that we didn’t have to do any kind of complex patches here as the code itself doesn’t modify the stack pointer or other saved pieces of context between the setjmp and the longjmp calls. This would have been a bit more complex if we had had to do the mentioned kind of context restoration. Anyway, let’s take a look at the decompilation of the function now:

__int64 __fastcall create_dream(__int64 dream_descriptor)
{
  // [...]
  ret_value = 0;
  for ( i = 0; i <= 7; ++i )
  {
    ret_value |= (*(dream_descriptor + 120) ^ *(dream_descriptor + 220)) << i;
    _nightmare((dream_descriptor + 120), (dream_descriptor + 220), 0, 0);
  }
  return ret_value;
}

Sweet! We’ve got a much cleaner decompilation of this function, and now we understand that each byte of the keystream is a bitmap that is constructed by XORing the first bit of the first bitmap by the first bit of the second bitmap in a loop of 8 iterations (8 bits == 1 byte). After each XOR operation, there’s a call to _nightmare (that’s the original symbol), which probably does some manipulation on these bitmaps. Let’s dive into it now:

void __fastcall _nightmare(_BYTE *bitmap_1, _BYTE *bitmap_2, char always_0, char always_0_)
{
  // [...]
  _nightmare_r(bitmap_1, 0, bitmap_1[14] ^ bitmap_2[1]);
  _nightmare_s(bitmap_2, 0, bitmap_1[4] ^ bitmap_2[51]);
}

Pretty simple - each bitmap is manipulated using a different “nightmare” function. Note - the second argument to each one of these functions is not necessarily 0, but I changed the decompilation here to that as the only call to this function we’re intersted in leads to these arguments being 0, so this simplification just clears some mess :)

Let’s start by taking a look at _nightmare_r, which is used to manipulate bitmap_1:

void __fastcall _nightmare_r(__int64 bitmap_1, char always_0, char should_salt)
{
  // [...]
  v14 = 867231021857440000LL;
  v15 = 0x1C1916151413100DLL;
  qmemcpy(v16, "%&)*-.2468:<=?@ABCGHOPQRWXYZ[\\^_`a", 34);
  v11 = always_0 ^ *(bitmap_1 + 99);
  v12 = _setjmp(env);
  if ( v12 <= 99 )
  {
    if ( v12 )
      *(v17 + v12) = *(v12 - 1LL + bitmap_1);
    else
      LOBYTE(v17[0]) = 0;
    if ( v12 == v14 )
      *(v17 + v12) ^= v11;
    if ( should_salt )
      *(v17 + v12) ^= *(v12 + bitmap_1);
    longjmp(env, v12 + 1);
  }
  v3 = v17[1];
  *bitmap_1 = v17[0];
  *(bitmap_1 + 8) = v3;
  v4 = v17[3];
  *(bitmap_1 + 16) = v17[2];
  *(bitmap_1 + 24) = v4;
  v5 = v17[5];
  *(bitmap_1 + 32) = v17[4];
  *(bitmap_1 + 40) = v5;
  v6 = v17[7];
  *(bitmap_1 + 48) = v17[6];
  *(bitmap_1 + 56) = v6;
  v7 = v17[9];
  *(bitmap_1 + 64) = v17[8];
  *(bitmap_1 + 72) = v7;
  v8 = v17[11];
  *(bitmap_1 + 80) = v17[10];
  *(bitmap_1 + 88) = v8;
  *(bitmap_1 + 96) = v18;
}

Again, a lot of setjmp and longjmp mess, but the general outline is pretty basic - this function just does some basic manipulation on bitmap_1, which we can re-implement in Python as a z3 script to find the original bitmap, given a specific bitmap. Before we do that, let’s try to beautify & fix the decompilation a bit using our patch trick:

void __fastcall _nightmare_r(char *bitmap_1, char input, char should_salt)
{
  // [...]
  *(_QWORD *)xor_indicies = 0xC09060504030100LL;
  *(_QWORD *)&xor_indicies[8] = 0x1C1916151413100DLL;
  qmemcpy(&xor_indicies[16], "%&)*-.2468:<=?@ABCGHOPQRWXYZ[\\^_`a", 34);
  idk = input ^ bitmap_1[99];
  xor_idx = 0;
  for ( idx = 0; idx <= 99; ++idx )
  {
    if ( idx )
      new_bitmap_1[idx] = bitmap_1[idx - 1];
    else
      new_bitmap_1[0] = 0;
    if ( idx == xor_indicies[xor_idx] )
    {
      new_bitmap_1[idx] ^= idk;
      ++xor_idx;
    }
    if ( should_salt )
      new_bitmap_1[idx] ^= bitmap_1[idx];
  }
  // [...]
}

This algorithm is pretty simple. The only tricky part here is the should_salt argument which controls the manipulation a bit, and is based on both bitmaps, making it some kind of “mutual” value. Keep that in mind for the next section. Anyway, this algorithm can be re-implemented in Python as follows:

xored_indicies = [0x00, 0x01, 0x03, 0x04, 0x05, 0x06, 0x09, 0x0c, 0x0d, 0x10, 0x13, 0x14, 0x15, 0x16, 0x19, 0x1c, 0x25, 0x26, 0x29, 0x2a, 0x2d, 0x2e, 0x32, 0x34, 0x36, 0x38, 0x3a, 0x3c, 0x3d, 0x3f, 0x40, 0x41, 0x42, 0x43, 0x47, 0x48, 0x4f, 0x50, 0x51, 0x52, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5e, 0x5f, 0x60, 0x61, 0xdb, 0xf7, 0xff, 0x7f, 0x00, 0x00, 0xc0, 0xf4, 0xfc, 0xf7, 0xff, 0x7f, 0x00, 0x00]

def get_new_bitmap_1(bitmap_1, should_salt):
	new_bitmap_1 = [0] * 100
	idk = bitmap_1[99]
	xor_idx = 0
	for i in range(100):
		if i:
			new_bitmap_1[i] = bitmap_1[i - 1]
		if i == xored_indicies[xor_idx]:
			new_bitmap_1[i] = new_bitmap_1[i] ^ idk
			xor_idx = xor_idx + 1
		if should_salt:
			new_bitmap_1[i] = new_bitmap_1[i] ^ bitmap_1[i]
	return new_bitmap_1

The reason I love Python so much for this kind of scripts is the fact that it is not (really) strongly typed. In most cases, it is a disaster for developers, but when writing such scripts, we can implement functions that can accept both concrete values and z3 expressions. Thus, if we want to use it to reverse the operation of this algorithm, we can just call get_new_bitmap_1 where bitmap_1 is an array of z3 bit vectors, compare the result against the actual bitmap, and let z3 find the original bitmap. Similarly, we can call get_new_bitmap_1 where bitmap_1 is an actual array of bits, and validate our re-implementation.

Let’s now try to do the same for the second bitmap manipulation. Here is the fixed & beautified decompilation:

void __fastcall _nightmare_s(char *bitmap_2, char always_0, char should_salt)
{
  // [...]
  *(_QWORD *)&xor_matrices[0][0] = 0x10100000000LL;
  *(_QWORD *)&xor_matrices[0][8] = 0x1010101000100LL;
  *(_QWORD *)&xor_matrices[0][16] = 0x100010001000001LL;
  *(_QWORD *)&xor_matrices[0][24] = 0x1010001000100LL;
  // [...]
  *(_QWORD *)&xor_matrices[3][88] = 0x1000000000000LL;
  *(_DWORD *)&xor_matrices[3][96] = 0x1000000;
  xored = always_0 ^ bitmap_2[99];
  local_bitmap[0] = 0;
  local_bitmap[99] = bitmap_2[98];
  for ( i = 0; i + 1 <= 98; ++i )
  {
    val = i + 1;
    local_bitmap[val] = (xor_matrices[0][val] ^ bitmap_2[val]) & (bitmap_2[val + 1] ^ xor_matrices[1][val]) ^ bitmap_2[val - 1];
  }
  if ( should_salt )
  {
    for ( j = 0; j <= 99; ++j )
      new_bitmap_2[j] = xored ^ local_bitmap[j] ^ xor_matrices[3][j];
  }
  else
  {
    for ( k = 0; k <= 99; ++k )
      new_bitmap_2[k] = xored ^ local_bitmap[k] ^ xor_matrices[2][k];
  }
  // [...]
}

A bit more complex algorithm, but also pretty easy to re-implement in Python:

xor_matrices = [
    [
        0x00,
        0x00,
        0x00,
        0x00,
        0x01,
        0x01,
        # [...]
    ],
    # [...]
]

def get_new_bitmap_2(bitmap_2, should_salt):
    xored = bitmap_2[99]
    local_bitmap = [0] * 100
    new_bitmap_2 = [0] * 100
    local_bitmap[99] = bitmap_2[98]
    i = 0
    while i + 1 <= 98:
        i = i + 1
        local_bitmap[i] = (xor_matrices[0][i] ^ bitmap_2[i]) & (bitmap_2[i + 1] ^ xor_matrices[1][i]) ^ bitmap_2[i - 1]
    if should_salt:
        for j in range(100):
            new_bitmap_2[j] = xored ^ local_bitmap[j] ^ xor_matrices[3][j]
    else:
        for k in range(100):
            new_bitmap_2[k] = xored ^ local_bitmap[k] ^ xor_matrices[2][k]
    return new_bitmap_2

Noice. We’ve got Python re-implementations of the bitmap rotation / manipulation algorithms, and we have a solid understanding of the keystream generation mechanism.

In summary:

Each byte of the keystream is a bitmap that is constructed by XORing bitmap_1[0] and bitmap_2[0] in a loop.
After each XOR operation, the bitmaps are rotated using well-defined bitwise algorithms.
The rotation is done separately for each bitmap, but it has a should_salt argument which is mutual and based on both bitmaps. This is very important to keep in mind.

writing a decryption script

Let’s start with reminding what we’ve got. We have three pieces of information in our hands:

The original data XORed against the generated keystream.
The compressed form of the last value of the 1st bitmap.
The compressed form of the last value of the 2nd bitmap.

So, let’s starting with parsing this data:

raw_encrypted_file = open(argv[1], "rb").read()

encrypted_flag = raw_encrypted_file[:-26]

final_bitmap_1 = list(chain(*map(lambda byte: map(int, bin(byte)[2:].rjust(8, "0")[::-1]), raw_encrypted_file[-26:-13])))[:100]
final_bitmap_2 = list(chain(*map(lambda byte: map(int, bin(byte)[2:].rjust(8, "0")[::-1]), raw_encrypted_file[-13:])))[:100]

Now, we’d like to find out how to retrieve the keystream from these bitmaps. So, as we know, we can reverse the operation of each bitmap rotation algorithm using our re-implementations and z3, and get the original bitmap(s). However, we still have the tricky mutual should_salt value which is based on bits from the two bitmaps. The should_salt value of bitmap_1 is bitmap_1[14] ^ bitmap_2[1], and the second one is bitmap_1[4] ^ bitmap_2[51]. Each should_salt value can be either 0 or 1, so we have 4 options here - (0, 0), (0, 1), (1, 0) and (1, 1). We can guess each one of these values, do the reverse operation, and see if the should_salt values that the original bitmaps yield match the guessed should_salt values. This will not eliminate all options, but will definitely reduce the amount of options here. Let’s implement that:

def get_original_bitmaps(final_bitmap_1, final_bitmap_2):
    bitmap_1 = [BitVec(f"bitmap_1_{i}", 1) for i in range(100)]
    bitmap_2 = [BitVec(f"bitmap_2_{i}", 1) for i in range(100)]

    options = [
        (False, False),
        (False, True),
        (True, False),
        (True, True)
    ]

    results = []
    for (salt_1, salt_2) in options:
        new_bitmap_1 = get_new_bitmap_1(bitmap_1, salt_1)
        new_bitmap_2 = get_new_bitmap_2(bitmap_2, salt_2)

        constraints_1 = [new_bitmap_1[i] == final_bitmap_1[i] for i in range(100)]
        constraints_2 = [new_bitmap_2[i] == final_bitmap_2[i] for i in range(100)]

        s = Solver()
        for c in constraints_1 + constraints_2:
            s.add(c)

        while s.check() == sat:
            m = s.model()

            a = []
            for i in range(100):
                a.append(bitmap_1[i] != m[bitmap_1[i]])
                a.append(bitmap_2[i] != m[bitmap_2[i]])
            s.add(Or(*a))

            original_bitmap_1 = [int(repr(m[bitmap_1[i]])) for i in range(100)]
            original_bitmap_2 = [int(repr(m[bitmap_2[i]])) for i in range(100)]

            bit = original_bitmap_1[0] ^ original_bitmap_2[0]

            actual_salt_1 = (original_bitmap_1[14] ^ original_bitmap_2[1]) != 0
            actual_salt_2 = (original_bitmap_1[4] ^ original_bitmap_2[51]) != 0

            if salt_1 == actual_salt_1 and salt_2 == actual_salt_2:
                results.append(((original_bitmap_1, original_bitmap_2), bit))

    return results

Now, we’re able to recover all original bitmaps that can lead to having two specific final bitmaps after the rotation operations. Doing that recursively, we’ll be able to recover the original bitmap values. It’s important to note that for each 8 iterations, we should check if the generated keystream byte XORed against the matching encrypted byte yields a valid ASCII character. If that’s the case, we can continue and try to recover more characters in that recursion branch, until we recover the whole keystream. This will save us from spending a lot of CPU cycles on incorrect keystreams. Here’s the implementation of this algorithm:

def split(arr, n):
    return [arr[i:i+n] for i in range(0, len(arr), n)]

def recursive_recovery(final_bitmap_1, final_bitmap_2, n, bits=None):
    bits = [] if bits == None else bits

    if len(bits) > 0 and len(bits) % 8 == 0:
        key = int("".join(map(str, bits[-8:])), 2)
        e = encrypted_flag[len(encrypted_flag)-(len(bits)//8)]
        f = e ^ key
        if (f < 31 or f > 127) and f != 10:
            return None

    if n == 0:
        key = list(map(lambda raw: int("".join(map(str, raw)), 2), split(bits, 8)))[::-1]
        flag = "".join([chr(key[i] ^ encrypted_flag[i]) for i in range(len(encrypted_flag))])
        return [flag]

    results = get_original_bitmaps(final_bitmap_1, final_bitmap_2)
    if len(results) == 0:
        return None

    options = []
    for (original_bitmap_1, original_bitmap_2), bit in results:
        recovered = recursive_recovery(original_bitmap_1, original_bitmap_2, n - 1, bits + [bit])
        if recovered != None:
            options += recovered

    if len(options) == 0:
        return None

    return options

Now, we can just give this algorithm our parsed bitmaps as arguments, and print out the results:

recovered = set(recursive_recovery(final_bitmap_1, final_bitmap_2, len(encrypted_flag) * 8))
for flag in recovered:
    if "zer0pts{" not in flag:
        continue
    print(f"{flag}", end="")

Giving flag.txt.enc as an argument, the scripts takes ~30 minutes to complete (we can definitely optimize it), and yields the following result:

$ ./solve.py flag.txt.enc
zer0pts{Mu7u4l_1rr3gul4r_Cl0ck1ng_KEYstr34m_g3n3r470r}

Pure satisfaction!

conclusion

I really liked this challenge as the author included a nice anti-reversing trick I didn’t know of (replacing loops with setjmp-longjmp pairs), but didn’t take it to the extreme with all kinds of hardcore obfuscation techniques you commonly see in CTFs today. Instead, he kept the focus on writing the decryption script, which is the main challenge here IMO, and it personally helped me sharpen my z3 scripting skills, which I haven’t really practiced lately. Thank you ptr-yudai!

tags: rev - z3

m4gnum's blog

Just some random thoughts about reversing, exploitation, OS internals, hypervisors, compilers and platform security.