Skip to content

Tutorial #4: DCA against Karroumi 2010 challenge

Philippe Teuwen edited this page Jun 12, 2016 · 7 revisions

Here is the fourth tutorial of our series.

Specificities of this tutorial:

  • Learning how to focus on the interesting address range by observing visually one trace
  • Learning how to reduce the scope of the DPA phase by observing leakage positions and reducing number of traces, for a faster attack
  • Learning how to try other targets than the values after the S-boxes
  • Learning how to go back from the leakages to the corresponding source code lines in the white-box

The corresponding materials are available at https://github.com/SideChannelMarvels/Deadpool/tree/master/wbs_aes_karroumi2010

As there was no real challenge with a Karroumi implementation, we did ours based on the one of Dušan Klinec available at https://github.com/ph4r05/Whitebox-crypto-AES

You can either compile your own, following https://github.com/SideChannelMarvels/Deadpool/blob/master/wbs_aes_karroumi2010/target/README.md or use the compiled binary https://github.com/SideChannelMarvels/Deadpool/tree/master/wbs_aes_karroumi2010/target/main

The binary is both a generator of white-box tables and an engine to encrypt/decrypt based on those tables (and a BGE attack tool too ^^, which was the whole point of Dušan to implement Karroumi!).

First step is to generate new white-box tables:

./main --create-table tables_karroumi_noextenc.tbl

Note that currently serializing external encodings in those tables does not work. We used an awful hack when we wrote the paper and I hope the upstream code will be fixed soon so that we can try properly external encodings, but for now we won't use external encodings (this doesn't influence the results anyway).

Usage of the newly created tables is as following:

./main --load-tables tables_karroumi_noextenc.tbl --input-files <(echo 6bc1bee22e409f96e93d7e117393172a|xxd -r -p) --out-file >(xxd -p)
3ad77bb40d7a3660a89ecaf32466ef97

So, acquiring a first full trace is done with:

Tracer -t sqlite -- ./main --load-tables tables_karroumi_noextenc.tbl --input-files <(echo 6bc1bee22e409f96e93d7e117393172a|xxd -r -p) --out-file >(xxd -p)

The trace is about 1Gb large.

Fire tracegraph, load the sqlite trace and rescale with the Overview zoom.

Full trace

The trace is very large but what takes most of the time is the initialization of the Boost libraries and the loading of the serialized tables. But the crypto part we're interested in is the little structure visible on the bottom left. Zoomed in, it gives:

Trace around AES

Roughly the AES is around 0x451150-0x4555a0.

Alternatively, to find the range of addresses to trace, one can look at the binary... WBAES::encdec() is the function using the white-box tables.

nm main -S |grep encdec
0000000000451150 0000000000004cdd T _ZN5WBAES6encdecER6_W128Bb

which gives 0x451150-0x455e2d

But seeing the loops, we can probably restrict ourselves to about 0x4512c0-0x452477:

Trace around AES, with restricted range

"Real" stack (so after the few tables dynamically loaded on the stack) is after 0x7ffffff68000. You may find this address if you zoom adequately.

With this information, we can mount the attack with the following parameters:

from deadpool_dca import *

def processinput(iblock, blocksize):
    return ["--load-tables tables_karroumi_noextenc.tbl --input-files <(echo %0*x|xxd -r -p) --out-file >(xxd -p)" % (2*blocksize, iblock)]

def processoutput(output, blocksize):
    return int(''.join([x for x in output.split('\n') if len(x)==32][0]), 16)

filters=[Filter('mem_addr1_rw1', ['R', 'W'], lambda stack_range, addr, size, data: addr < 0x7ffffff68000 and size == 1, lambda addr, size, data: addr & 0xFF, '<B'),
         Filter('mem_data_rw1',  ['R', 'W'], lambda stack_range, addr, size, data: addr < 0x7ffffff68000 and size == 1, lambda addr, size, data: data, '<B')]

T=TracerPIN('./main', processinput, processoutput, ARCH.amd64, 16, addr_range='0x4512c0-0x452477', filters=filters, shell=True)
T.run(2000)
bin2daredevil(keywords=filters,
              configs={'attack_sbox':   {'algorithm':'AES', 'position':'LUT/AES_AFTER_SBOX',    'correct_key':'0x2b7e151628aed2a6abf7158809cf4f3c'},
                       'attack_multinv':{'algorithm':'AES', 'position':'LUT/AES_AFTER_MULTINV', 'correct_key':'0x2b7e151628aed2a6abf7158809cf4f3c'}})

Results will vary with your own randomly generated white-box. It might even be that you can recover the full key at the first step if you're lucky but here is a typical example where the attack may fail on some byte(s) and how to break them. Best results are when using last byte of addresses and considering the highest absolute correlation amongst the bit correlations, so I removed the information about sum(correlations) for clarity:

daredevil -c mem_addr1_rw1_2000_17096.attack_sbox.config |egrep --line-buffered "(Best|0:)"|grep -v "sum"
Best 10 candidates for key byte #0 according to highest abs(bit_correlations):
 0: 0x2b peak: 0.384138  <==
Best bit: 4 rank: 0.       -0.384138    0x2b    2071
Best 10 candidates for key byte #1 according to highest abs(bit_correlations):
 0: 0x41 peak: 0.260993
Best bit: 2 rank: 0.        0.257112    0x7e    3211
Best 10 candidates for key byte #2 according to highest abs(bit_correlations):
 0: 0x15 peak: 0.25414   <==
Best bit: 0 rank: 0.       -0.229832    0x15    2942
Best 10 candidates for key byte #3 according to highest abs(bit_correlations):
 0: 0x16 peak: 0.520946  <==
Best bit: 0 rank: 0.       -0.250154    0x16    2496
Best 10 candidates for key byte #4 according to highest abs(bit_correlations):
 0: 0x28 peak: 0.619194  <==
Best bit: 0 rank: 0.       -0.406104    0x28    2463
Best 10 candidates for key byte #5 according to highest abs(bit_correlations):
 0: 0xae peak: 0.757938  <==
Best bit: 0 rank: 0.       -0.268071    0xae    2050
Best 10 candidates for key byte #6 according to highest abs(bit_correlations):
 0: 0xd2 peak: 0.379755  <==
Best bit: 2 rank: 0.        0.260265    0xd2    3327
Best 10 candidates for key byte #7 according to highest abs(bit_correlations):
 0: 0xa6 peak: 0.496979  <==
Best bit: 1 rank: 0.         0.27145    0xa6    2914
Best 10 candidates for key byte #8 according to highest abs(bit_correlations):
 0: 0xab peak: 0.34484   <==
Best bit: 0 rank: 0.        -0.34484    0xab    2837
Best 10 candidates for key byte #9 according to highest abs(bit_correlations):
 0: 0xf7 peak: 0.542857  <==
Best bit: 2 rank: 0.        0.542857    0xf7    2442
Best 10 candidates for key byte #10 according to highest abs(bit_correlations):
 0: 0x15 peak: 0.275168  <==
Best bit: 6 rank: 0.        0.275168    0x15    2119
Best 10 candidates for key byte #11 according to highest abs(bit_correlations):
 0: 0x88 peak: 0.54094   <==
Best bit: 2 rank: 0.       -0.527773    0x88    3264
Best 10 candidates for key byte #12 according to highest abs(bit_correlations):
 0: 0x09 peak: 0.500975  <==
Best bit: 0 rank: 0.       -0.500975    0x09    3245
Best 10 candidates for key byte #13 according to highest abs(bit_correlations):
 0: 0xcf peak: 0.495989  <==
Best bit: 4 rank: 0.       -0.495989    0xcf    2819
Best 10 candidates for key byte #14 according to highest abs(bit_correlations):
 0: 0x4f peak: 0.393279  <==
Best bit: 1 rank: 0.       -0.241375    0x4f    2542
Best 10 candidates for key byte #15 according to highest abs(bit_correlations):
 0: 0x72 peak: 0.288065

We see almost all bytes can be properly recovered, only byte #1 cannot be recovered.
We see also that the useful leakages occur roughly between sample 2000 and sample 3350 so we know how to replay the attack on potentially other instances, but faster:

Edit mem_addr1_rw1_2000_17096.attack_sbox.config

index=2000
nsamples=1350

Attack takes now 2s per key byte, so 32s in total.
We may even reduce the number of traces to about 500 (by trial & error), which brings the attack down to 8s.

trace=mem_addr1_rw1_2000_17096.trace 500 17096
guess=mem_addr1_rw1_2000_17096.input 500 16

About the missing key byte #1, it's trivial to brute-force but we can try something else: Karroumi dual ciphers are about alternative affine transformations so we might e.g. try as target the application of the multiplicative inverse to the data, a step normally internal to the S-box. But first, apply the same changes in mem_addr1_rw1_2000_17096.attack_multinv.config to execute this second attack faster.

daredevil -c mem_addr1_rw1_2000_17096.attack_multinv.config |egrep --line-buffered "(Best|0:)"|grep -v sum
Best 10 candidates for key byte #0 according to highest abs(bit_correlations):
 0: 0xfc peak: 0.260916
Best bit: 3 rank: 1.        0.250943    0x2b    2085
Best 10 candidates for key byte #1 according to highest abs(bit_correlations):
 0: 0x7e peak: 0.502287  <==
Best bit: 0 rank: 0.       -0.257893    0x7e    3203
Best 10 candidates for key byte #2 according to highest abs(bit_correlations):
 0: 0x15 peak: 0.367983  <==
Best bit: 0 rank: 0.        0.242405    0x15    2927
Best 10 candidates for key byte #3 according to highest abs(bit_correlations):
 0: 0x16 peak: 0.746864  <==
Best bit: 0 rank: 0.        0.746864    0x16    2554
Best 10 candidates for key byte #4 according to highest abs(bit_correlations):
 0: 0x28 peak: 0.410953  <==
Best bit: 2 rank: 0.        0.255894    0x28    2477
Best 10 candidates for key byte #5 according to highest abs(bit_correlations):
 0: 0xae peak: 0.49706   <==
Best bit: 3 rank: 0.        -0.49706    0xae    2104
Best 10 candidates for key byte #6 according to highest abs(bit_correlations):
 0: 0xd2 peak: 0.409918  <==
Best bit: 0 rank: 0.        0.409918    0xd2    3278
Best 10 candidates for key byte #7 according to highest abs(bit_correlations):
 0: 0xa6 peak: 0.54717   <==
Best bit: 1 rank: 0.       -0.448853    0xa6    2907
Best 10 candidates for key byte #8 according to highest abs(bit_correlations):
 0: 0x32 peak: 0.264   
Best 10 candidates for key byte #9 according to highest abs(bit_correlations):
 0: 0xf7 peak: 0.511092  <==
Best bit: 1 rank: 0.       -0.511092    0xf7    2450
Best 10 candidates for key byte #10 according to highest abs(bit_correlations):
 0: 0x6c peak: 0.273054
Best bit: 7 rank: 0.         -0.2561    0x15    2165
Best 10 candidates for key byte #11 according to highest abs(bit_correlations):
 0: 0x88 peak: 0.531063  <==
Best bit: 3 rank: 0.        0.531063    0x88    3299
Best 10 candidates for key byte #12 according to highest abs(bit_correlations):
 0: 0x09 peak: 0.393156  <==
Best bit: 4 rank: 0.         -0.2522    0x09    3246
Best 10 candidates for key byte #13 according to highest abs(bit_correlations):
 0: 0xcf peak: 0.324364  <==
Best bit: 0 rank: 0.        0.324364    0xcf    2827
Best 10 candidates for key byte #14 according to highest abs(bit_correlations):
 0: 0x10 peak: 0.256703
Best bit: 4 rank: 1.        0.242052    0x4f    2541
Best 10 candidates for key byte #15 according to highest abs(bit_correlations):
 0: 0x39 peak: 0.257039
Best bit: 0 rank: 1.       -0.242115    0x3c    2121

This time a few other bytes were not recovered but we got byte #1! So by combining both attacks we managed to recover the full key. In the rare event one byte is not revealed by an attack or the other, try to brute-force it (it's easy by comparing the relative correlation scores if a candidate seems legit or not) or to target other affine transforms than in the original S-box...

One step further:
In this exercise, for once, we've the source code. So let's see how to we can go back to the source and pinpoint the lines where the leaks occur.

From daredevil's reports let's find the best leaks (grep "rank: 0"):

Best bit: 0 rank: 0.       -0.292396    0x7e    3203
Best bit: 0 rank: 0.        0.280328    0x15    2927
Best bit: 0 rank: 0.        0.739881    0x16    2554
Best bit: 3 rank: 0.       -0.422051    0x28    2471
Best bit: 3 rank: 0.       -0.517008    0xae    2104
Best bit: 0 rank: 0.        0.415916    0xd2    3278
Best bit: 1 rank: 0.       -0.403762    0xa6    2907
Best bit: 1 rank: 0.        -0.52469    0xf7    2450
Best bit: 3 rank: 0.        0.508487    0x88    3297
Best bit: 4 rank: 0.       -0.271449    0x09    3246
Best bit: 0 rank: 0.        0.320264    0xcf    2827
Best bit: 4 rank: 0.       -0.407483    0x2b    2071
Best bit: 2 rank: 0.        0.317442    0x7e    3219
Best bit: 5 rank: 0.       -0.288184    0x15    2909
Best bit: 4 rank: 0.       -0.531342    0x16    2498
Best bit: 0 rank: 0.       -0.386349    0x28    2463
Best bit: 0 rank: 0.       -0.295254    0xae    2050
Best bit: 2 rank: 0.        0.267346    0xd2    3327
Best bit: 4 rank: 0.       -0.306637    0xa6    2929
Best bit: 3 rank: 0.        -0.30787    0xab    2854
Best bit: 2 rank: 0.        0.514181    0xf7    2442
Best bit: 6 rank: 0.        0.279898    0x15    2119
Best bit: 2 rank: 0.       -0.583486    0x88    3264
Best bit: 0 rank: 0.       -0.432023    0x09    3245
Best bit: 4 rank: 0.       -0.448291    0xcf    2819
Best bit: 1 rank: 0.       -0.245292    0x4f    2542

Last column is the sample number. We know how we filtered instructions and how we constructed the traces, so we know to which instructions those leaks correspond. But that would be more interesting to know the source line.

So, first, recompile main with debug info activated in CMakeLists.txt:

-set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -O2 -std=c++0x")
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -g -O2 -std=c++0x")

Let's rename that version make_debug to avoid confusion.

Check if the targeted function is still at the same addresses or not:

nm main -S |grep encdec
0000000000451150 0000000000004cdd T _ZN5WBAES6encdecER6_W128Bb
nm main_debug -S |grep encdec
0000000000451110 0000000000004cdd T _ZN5WBAES6encdecER6_W128Bb

Hmm the function is now 0x40 bytes earlier, so let's adapt the filtering range to 0x451280-0x452437.

There is a function sample2event to convert a sample number into an instruction and, if debug info is available, a source line. It's based on the special trace .info that is created when acquiring traces. So let's acquire one trace again with main_debug to get the right addresses. But first delete the existing .info:

rm *.info
from deadpool_dca import *

def processinput(iblock, blocksize):
    return ["--load-tables tables_karroumi_noextenc.tbl --input-files <(echo %0*x|xxd -r -p) --out-file >(xxd -p)" % (2*blocksize, iblock)]

def processoutput(output, blocksize):
    return int(''.join([x for x in output.split('\n') if len(x)==32][0]), 16)

filters=[Filter('mem_addr1_rw1', ['R', 'W'], lambda stack_range, addr, size, data: addr < 0x7ffffff68000 and size == 1, lambda addr, size, data: addr & 0xFF, '<B')]
T=TracerPIN('./main_debug', processinput, processoutput, ARCH.amd64, 16, addr_range='0x451280-0x452437', filters=filters, shell=True)
T.run(1)
for sample in [3203, 2927, 2554, 2471, 2104, 3278, 2907, 2450, 3297, 3246, 2827, 2071, 3219, 2909, 2498, 2463, 2050, 3327, 2929, 2854, 2442, 2119, 3264, 3245, 2819, 2542]:
    print sample2event(sample, filters[0], '../target/main_debug')

Result is:

(401, [('[R]', 817, 4528849, 140736938848444, 1, 1, 'WBAES.h:202 (discriminator 2)')])
(366, [('[R]', 746, 4529440, 140736938839257, 1, 3, 'WBAES.h:202 (discriminator 2)')])
(320, [('[R]', 663, 4529532, 140736938827493, 1, 15, 'WBAES.h:202 (discriminator 2)')])
(309, [('[R]', 651, 4529061, 140736938824847, 1, 15, 'WBAES.h:202 (discriminator 2)')])
(263, [('[R]', 568, 4529150, 140736938813083, 1, 10, 'WBAES.h:202 (discriminator 2)')])
(410, [('[R]', 826, 4529260, 140736938850783, 1, 4, 'WBAES.h:202 (discriminator 2)')])
(364, [('[R]', 743, 4529346, 140736938838897, 1, 7, 'WBAES.h:202 (discriminator 2)')])
(307, [('[R]', 649, 4528943, 140736938824282, 1, 2, 'WBAES.h:202 (discriminator 2)')])
(413, [('[R]', 829, 4529422, 140736938851634, 1, 3, 'WBAES.h:202 (discriminator 2)')])
(406, [('[R]', 822, 4529073, 140736938849295, 1, 5, 'WBAES.h:202 (discriminator 2)')])
(354, [('[R]', 733, 4528885, 140736938836477, 1, 9, 'WBAES.h:202 (discriminator 2)')])
(259, [('[R]', 564, 4528943, 140736938812046, 1, 7, 'WBAES.h:202 (discriminator 2)')])
(403, [('[R]', 819, 4528943, 140736938848926, 1, 2, 'WBAES.h:202 (discriminator 2)')])
(364, [('[R]', 743, 4529346, 140736938838897, 1, 7, 'WBAES.h:202 (discriminator 2)')])
(313, [('[R]', 655, 4529190, 140736938825927, 1, 10, 'WBAES.h:202 (discriminator 2)')])
(308, [('[R]', 650, 4529036, 140736938825080, 1, 0, 'WBAES.h:202 (discriminator 2)')])
(257, [('[R]', 562, 4528849, 140736938811471, 1, 12, 'WBAES.h:202 (discriminator 2)')])
(416, [('[R]', 833, 4529532, 140736938851875, 1, 1, 'WBAES.h:202 (discriminator 2)')])
(367, [('[R]', 747, 4529521, 140736938839954, 1, 8, 'WBAES.h:202 (discriminator 2)')])
(357, [('[R]', 736, 4529061, 140736938837138, 1, 10, 'WBAES.h:202 (discriminator 2)')])
(306, [('[R]', 648, 4528885, 140736938824171, 1, 4, 'WBAES.h:202 (discriminator 2)')])
(265, [('[R]', 570, 4529190, 140736938813690, 1, 2, 'WBAES.h:202 (discriminator 2)')])
(408, [('[R]', 824, 4529169, 140736938850187, 1, 14, 'WBAES.h:202 (discriminator 2)')])
(406, [('[R]', 822, 4529073, 140736938849295, 1, 5, 'WBAES.h:202 (discriminator 2)')])
(353, [('[R]', 732, 4528849, 140736938836169, 1, 10, 'WBAES.h:202 (discriminator 2)')])
(318, [('[R]', 661, 4529440, 140736938826895, 1, 15, 'WBAES.h:202 (discriminator 2)')])

Where we get for each sample the corresponding event number, followed by informationextracted from the .info file: memory mode, item, instruction address, memory address, memory size, memory data, and finally the corresponding source line.

Well here all leaks seem to come from the same line, which is actually:

    OP8XOR(o1, o2, xtb, res);

cf https://github.com/ph4r05/Whitebox-crypto-AES/blob/38f1127db2df22c62490f72b846263644c9c3ce8/WBAES.h#L202

It's an inlined macro heavily used in the core of the white-box code, cf https://github.com/ph4r05/Whitebox-crypto-AES/blob/38f1127db2df22c62490f72b846263644c9c3ce8/WBAES.cpp#L171

This toturial ends here, but if you want to go further, you may e.g. inline yourself the macro so you'll know which lines of WBAES.cpp leak and from there, two options:

  • Try to fix Karroumi :)
  • Make a faster attack out of the source code by dumping immediately the address bytes at those lines to a file, which would be far more efficient than using instrumentation on the binary.

So the binary would embed itself a tool to attack other instances of the tables, which is already what happens with the BGE attack embedded by Dušan in main:

See option --bench-bge [=arg(=0)] (=0) Benchmarking rounds for AES BGE attack