Tuesday 22 November 2016

Constraint Programming with python-constraint

I have recently been looking at constraint programming and I thought I would post some of my code as I go. This is using the python-constraint package.

Doing things with python-constraint is pretty easy. To constrain a variable to have a certain value, we can use InSetConstraint. To make sure all the variables in a set are different, we use the AllDifferentConstraint. These are enough to solve sudoku.

Sudoku

I don't think Sudoku needs any introduction. The constraints are fairly easy to specify: each row must be all different, each col must be all different and each 3x3 block must be all different. Also we have to ensure that the given starting state is used, for this we use equality constraints. We use 1 variable for each number in the sudoku square, so 81 variables all up.

from constraint import *
problem = Problem()
problem.addVariables(range(9*9), range(1,10))

# specify the given starting sudoku problem here, dots are blank squares
sudoku = '\
.42986.1.\
....2...5\
......82.\
4.1..793.\
.683.254.\
.758..1.2\
.39......\
7...4....\
.1.67325.'

# Make sure our solution matches the specified starting thing
for i,char in enumerate(sudoku):
    if char != '.':  problem.addConstraint(InSetConstraint([int(char)]),[i])
    
# add row and col constraints
for i in range(9): 
    problem.addConstraint(AllDifferentConstraint(),range(9*i,9*i+9))
for i in range(9): 
    problem.addConstraint(AllDifferentConstraint(),range(i,81,9))
# add block constraints
for i in [0,3,6,27,30,33,54,57,60]: 
    ind = []
    for j in range(3):
        for k in range(3):
            ind.append(i + j*9 + k)
    problem.addConstraint(AllDifferentConstraint(),ind)

# pretty print the output
a = problem.getSolution()
for i in range(9):
    for j in range(9): print a[i*9+j],
    print ''

The Golomb Ruler

A bit of info on Golomb Rulers: here. If you know how to improve the code below, leave a comment. Golomb rulers are described by their order, which is the number of marks on their edge. The length of a Golomb ruler is the distance between the outer two marks and represents the longest distance it can measure.

There is no requirement that a Golomb ruler measures all distances up to their length. However, if a ruler does measure all distances, it is classified as a perfect Golomb ruler. There are no perfect Golomb rulers above order 4.

Finally, a Golomb ruler is described as optimal if no shorter ruler of the same order exists. The code below first looks for perfect rulers, then increments length each time until an optimal ruler is found. It starts to slow down by order = 7, it takes a few minutes.

This problem is harder than the sudoku problem, we have to use a set of auxiliary variables to represent the pairwise differences, then apply the AllDifferentConstraint to the auxiliary variables.

from constraint import *
from itertools import combinations

order = 7 # the number of marks on the Golomb ruler
diffs = [] # the set of pairwise differences between marks
for i in combinations(range(order),2): diffs.append(i)
length = len(diffs)

def lessthan(a,b): return a < b
def diffeq(a,b,res): return (b-a)==res     

while True:
    print 'trying length: '+str(length)
    problem = Problem()
    problem.addVariables(range(order), range(0,length+1))
    problem.addVariables(diffs,    range(1,length+1))

    # make sure the first mark is 0 and last is length
    problem.addConstraint(InSetConstraint([0]),[0])
    problem.addConstraint(InSetConstraint([length]),[order-1])

    # ensure that the marks are in increasing order
    # make sure all possible pairwise relations are constrained
    for i in diffs: problem.addConstraint(lessthan,[i[0],i[1]])

    # ensure that the differences reflect the marks
    for i in diffs: problem.addConstraint(diffeq,[i[0],i[1],i])
    
    # all the differences must be different    
    problem.addConstraint(AllDifferentConstraint(),diffs)
   
    # pretty print the output
    solutions = problem.getSolutions()
    for ruler in solutions:
        for mark in range(order): print ruler[mark],
        print ''
    if len(solutions) > 0: break # if we find some solutions, quit
    length = length + 1

example output for order = 7, note the shortest ruler is length 25, though a perfect ruler would be length 21:

trying length: 21
trying length: 22
trying length: 23
trying length: 24
trying length: 25
0 4 9 15 22 23 25
0 3 4 12 18 23 25
0 2 3 10 16 21 25
0 2 5 14 18 24 25
0 2 6 9 14 24 25
0 2 7 13 21 22 25
0 2 7 15 21 24 25
0 1 4 10 18 23 25
0 1 7 11 20 23 25
0 1 11 16 19 23 25

Magic Squares

Magic Squares seem to be very common examples when doing constraint programming, so I may as well include the code for them. Info on Magic Squares can be found here. This code is quite slow, I thought it would be much faster. This may be because python-constraint is just kind of slow, or maybe the problem is harder than I thought.

from constraint import *

problem = Problem()
size = 4
problem.addVariables(range(size*size), range(1,size*size+1))
total = size*(size*size + 1)/2

# make sure rows add to total
for i in range(size): 
    problem.addConstraint(ExactSumConstraint(total),range(i*size,i*size+size))
for i in range(size): 
    problem.addConstraint(ExactSumConstraint(total),range(i,size*size,size))
# add the diagonal constraints
diag1,diag2 = [],[]
for i in range(size):
    diag1.append(i*size + i)
    diag2.append(i*size + (size-i-1))

problem.addConstraint(ExactSumConstraint(total),diag1)
problem.addConstraint(ExactSumConstraint(total),diag2)
problem.addConstraint(AllDifferentConstraint())
   
# pretty print the output
solution = problem.getSolution()
for i in range(size):
    for j in range(size):
        print solution[i*size + j],
    print ''

Monday 14 November 2016

Terraforming Venus

I thought I would look at some of the figures involved in terraforming Venus, discussions of which can be found all over the place e.g. here. The numbers from this page come from here. Venus is the planet next to earth as you go towards the Sun. It has a year of 224.7 Earth days, and a day length of 243 Earth days (longer than the year!). It also rotates in the opposite direction to most other planets, likely due to some large collision early in its life.

I'll quote a little bit from wikipedia: Venus has an extremely dense atmosphere composed of 96.5% carbon dioxide, 3.5% nitrogen, and traces of other gases, most notably sulfur dioxide. The mass of its atmosphere is 93 times that of Earth's, whereas the pressure at its surface is about 92 times that at Earth's—a pressure equivalent to that at a depth of nearly 1 kilometre under Earth's oceans. The density at the surface is 65 kg/m3, 6.5% that of water or 50 times as dense as Earth's atmosphere at 20 °C at sea level. The CO2-rich atmosphere generates the strongest greenhouse effect in the Solar System, creating surface temperatures of at least 735 K (462 °C).

Terraforming Venus would involve removing most of the sulfuric acid and carbon dioxide, needless to say this is a monumental task that we don't currently have the technology for. Some of the truly staggering figures:

Total Mass of Venus Atmosphere	= 4.80E+20 kg
Percent Atmosphere CO2	= 96.50%
Total Mass of CO2	= 4.63E+20 kg

For comparison:

Total Mass of Earth's Atmosphere	= 5.10E+18 kg
Mass Ratio Venus to Earth	= 94.118

Utilizing the Bosch reaction (CO2 + 2H2 -> C + 2H2O) , combining hydrogen with carbon dioxide to make carbon graphite and water could be used to remove Venus' carbon dioxide. This reaction requires the introduction of iron, cobalt or nickel as a catalyst and requires a temperature level of 530-730 degrees Celsius. Since the reaction releases some heat, it could be self sustaining if you could supply enough hydrogen. A problem with this is that that the production of elemental carbon tends to foul the catalyst's surface, which is detrimental to the reaction's efficiency [ref].

Molecular Weight of CO2	= 44
Molecular Weight of 2*H2	= 4
Total Initial Molecular Weight	= 48
Molecular Weight of C (graphite)	= 12
Molecular Weight of 2*H2O	= 36
Total Final Molecular Weight	= 48

You could use water from e.g. comets to get hydrogen for the reaction. You would have to use Electrolysis to split the water and initiate the reaction. Alternatively you could reuse the water output from the Bosch reaction , recover the Hydrogen and continue. But this way you wouldn't have oceans on Venus at the end.

If you were to provide hydrogen for the reaction (4E19 kg), this massive reaction would create an ocean nearly as large as 1/4 of the Earth’s ocean:

Ratio of 2*H2O to CO2 (36 / 44)	= 0.818
Resultant Mass of H2O	= 3.79E+20 kg
Density of H2O	= 1.000 g / cm^3
	= 1,000,000.000 g / M^3
	= 1,000.000 kg / M^3
Volume of Resultant H2O	= 3.79E+17 M^3
	= 3.79E+08 kM^3
Area of Venus Surface	= 4.602E+08 kM^2
Average Depth of H2O	= 0.824 kM
Total Volume of Earth's Oceans	= 1.300E+09 kM^3
Average Depth of Earth's Oceans	= 3.682 kM

It would also result in the deposition of a layer of graphite with an average thickness over the entire surface of Venus roughly equal to a 40 story building:

Ratio of graphite to CO2 (12 / 44)	= 0.273
Resultant Mass of graphite	= 1.26E+20 kg
Density of C (graphite)	= 2.230 g / cm^3
	= 2,230,000.000 g / M^3
	= 2,230.000 kg / M^3
Volume of Resultant C (graphite)	= 5.66E+16 M^3
	= 5.66E+07 kM^3
Area of Venus Surface	= 4.602E+08 kM^2
Average Depth of C (graphite)	= 0.123 kM

So after converting all the carbon dioxide to graphite, you would be left with ~120 meter thick blanket of carbon over the planet. Though this may not be terrible, carbon makes a pretty good base for soils. There would still be a great deal of atmospheric pressure due to all the oxygen though, and a spark would result in a mighty conflagration that would undo all your work. A better idea may be to trap the carbon dioxide as carbonate rocks, but for this you would need large quantities of calcium and/or magnesium (About 8×10^20 kg of calcium or 5×10^20 kg of magnesium would be required). This would have the advantage of removing a lot of the oxygen, which would lower the atmospheric pressure. Unfortunately Magnesium carbonate begins to decompose at 350 degrees C, so Calcium might be a better bet. You would also need to remove all the sulphuric acid since it would dissolve all your carbonate rocks, using up the sulphuric acid but producing more carbon dioxide.

Of course you have to power all this as well, electrolysis doesn't come cheap. Solar panels can work well on Venus because the sunlight is 4 times stronger. Perhaps some genetically engineered plant...

Wednesday 12 October 2016

CODZ ciphers, testing RC4 in python

Some of the CODZ ciphers consist of binary data, and it is not clear what sort of cipher has created them. Notably 2,3,7 and 10 are base64 converted binary data. Ciphers 4,5,11, and 14 are hex data. I will be dealing with 2,3,7 and 10 in this post.

Cipher 2 decrypts to: dd3ed3b56ff21a7751ebbbaf6ffd1086f190a6b29b49c2a47e656bcc91768346df0321db23505874cb9f8a71d978490e and 3a411e6479e27ee31d601d486c7c807dbd20d85273ad599f90a7126caae9406be1434fdbbeeefa45d9c6cbf5ad4fb4f7 depending on wether you read the base64 data forwards or backwards. Either way you get exactly 384 bits of binary, which is suspiciously similar to the SHA384 hash length. I tried running john the ripper on the two hashes for a day or so, but nothing turned up. Either it is not SHA384 or we're gonna need some better word lists.

Ciphers 3,7 and 10 are longer. I have tested their entropy after breaking them up into different length segments, and the entropy is pretty much always equal to random, this means they are highly unlikely to have been generated by a classical cipher. More likely something modern like RC4. I have tried decrypting the 3 ciphers as RC4 using a wordlist for the keys, but no luck.

Code for testing entropy:


from base64 import decodestring
from bitarray import bitarray
from binascii import b2a_hex
from math import log

ctext = 'iW9cXmzOU7ZuZBtW40b3ng...' # I cut the rest of the cipher off to save space
data = decodestring(ctext)

a = bitarray(endian='little')
a.frombytes(data)

for bitlen in range(1,16):
  code = {}
  for i in range(2**bitlen):
    string = format(i,'0'+str(bitlen)+'b')
    code[i] = bitarray(string)
  num = a.decode(code)
  freq = {}
  for i in num:
      if i in freq: freq[i] += 1.
      else: freq[i] = 1.
  N = len(num)
  en = 0
  for i in freq.values():
    p = i/N
    en -= p*log(p)
  print bitlen,' entropy: ',en,' if random: ',-log(1./2**bitlen)

For decrypting RC4, we keep the key corresponding to the decrypt with the lowest entropy, if the wrong key is used something high entropy will result, but if it decrypts to text we should get something easily identifiable. Python code for decrypting RC4 with a word list:


data = base64.b64decode(ctext)

def rc4(data,key):
    S = range(256)
    j = 0
    out = []

    #KSA Phase
    for i in range(256):
        j = (j + S[i] + ord( key[i % len(key)] )) & 0xFF
        S[i] , S[j] = S[j] , S[i]

    #PRGA Phase
    i = j = 0
    for char in data:
        i = ( i + 1 ) & 0xFF
        j = ( j + S[i] ) & 0xFF
        S[i] , S[j] = S[j] , S[i]
        out.append(chr(ord(char) ^ S[(S[i] + S[j]) & 0xFF]))
    return out
    
# find a good word list and use it here
keylist = open("C:\Users\james\Desktop\cipher_stuff\simplesub_word\\count_1w.txt")

besten = 10e10
bestkey = ""
bestout = ""
for count,key in enumerate(keylist):
  key = key.split()[0].strip()
  for i in range(3):
    if i == 0: key = key[0].upper() + key[1:].lower()
    elif i == 1: key = key.upper()
    else: key = key.lower()
    out = rc4(data,key)
    # compute entropy of the decoded text
    freq = {}
    for i in out:
        if i in freq: freq[i] += 1.
        else: freq[i] = 1.
    N = len(out)
    en = 0
    for i in freq.values():
        p = i/N
        en -= p*log(p)
    if en < besten:
        besten = en
        bestkey = key
        bestout = ''.join(out[:20])
  if count % 500 == 0: print count, key, bestkey,besten,bestout
print count, key, bestkey,besten,bestout

Sunday 9 October 2016

CODZ 8. sha_paper

I havn't actually played the game, I just like breaking codes.

Apparently there are some new CODZ ciphers, the transcripts can be seen here: http://www.callofdutyzombies.com/topic/183529-all-revelations-ciphers-texture-files/#comment-1759652.

This post will look at the following cipher:


bx re yh zy bf lm kt ut yg se tb sx ky co jh km aq we tx wx
cy ji ut vt kn vc gx aw ij av qn lg ef fj uq bd kn sv cx fn
je wr rk kn cg aw xq vn zf li fh vz wt ta ia ij zf eh uf tj 
qm yg hl yq cx ij vw ig de qz tg nj rs er vk tm sa yv tw hr 
hs lt vy kr qc tv gh hb jn yb qh er ut gk et cs wv jl rh xo 
wr ex hr xt zi kc xs qs fd wd cm ku ah fh fj lf ui ly sh vf 
au xm hx qw dl gi cx vb dh wt xm kv un ej kt kt ye cq jd ef 
eh zv xt he uz tg cl jw nr tw ur vo jt jo ru iq iy rz ey ho 
gd nq yn bq ul ai fh bu ji ho nw qg yg vj if yv zu id jc gh 
ke xr qf cq ra it gw dl fc gq ti iu qu ny vr gy sj rh iu hi 
wr mv ym zi lk re vk xu ry uq gs ve gd yn bq ch ky er qh jr 
ho ya el ky zj ei hz cb if dk

There are 460 characters total, 25 distinct characters, IC is 0.0418. Counts:

a: 12, c: 18, b: 11, e: 23, d: 12, g: 20,
f: 19, i: 24, h: 29, k: 21, j: 23, m: 9,
l: 14, o:  7, n: 15, q: 23, s: 12, r: 24,
u: 20, t: 26, w: 18, v: 23, y: 25, x: 19,
z: 13

Since this cipher uses pairs of characters, and there are 25 distinct characters, it is highly likely to be foursquare, bifid, playfair or some other digraphic substitution cipher based on a 5 by 5 grid. Usually I and J are combined, but it looks like in this cipher P is missing, so It was combined with something else.

Other ciphers that use a 25 letter alphabet include BAZERIES, BIFID, CADENUS, CHECKERBOARD, CM BIFID, FOURSQUARE, PHILLIPS, PHILLIPS-C, PHILLIPS-RC, PLAYFAIR, SERIATED PLAYFAIR, TRI-SQUARE, TWIN BIFID and TWO-SQUARE. Unfortunately this is quite a lot, and trying them all will be time consuming.

I am trying 4 variants of the above the cipher, the original cipher: bkreyh..., the reversed cipher: kdfibc..., forward but with each pair reversed: xberhy..., and each pair forwards, but the pairs from last to first: dkifcb... .

BAZERIES is a substitution + transposition, which means its IC would be that of normal text. Since the IC of this text is ~0.04, I am confident in saying it is not BAZERIES . It also can't be CADENUS because the message length is not a multiple of 25. CHECKERBOARD uses a 5x5 key square, but can at maximum use 20 characters in the ciphertext.

Possibilities after thinking a bit: BIFID, CM BIFID, TWIN BIFID, FOURSQUARE, PHILLIPS, PLAYFAIR, SERIATED PLAYFAIR, TRI-SQUARE, and TWO-SQUARE. The next piece of information is that none of the letter pairs in the ciphertext contain a doubled letter e.g. AA, BB, CC etc. This is highly unlikely to occur with most ciphers, but it is a peculiarity of PLAYFAIR. So this puts playfair near the top of the list.

I have tried the playfair cryptanalysis program over here on all four variants and it hasn't worked. I did a test sentence of the same length which it cracked in just a few iterations, so I am fairly confident in saying it is not a vanilla playfair, or at least if it is then something else is going on as well (such as some sort of route cipher).

For bifid, there is no obvious period identifiable (see here for how to compute it), and some quick simulated annealing didn't turn anything up.

Regarding the possibility of routes, 460 has a lot of factors: 2x230,4x115,5x92,10x45,20x23. This means lots of routes around the rectangles when the characters are lined up right. Exploring this would add a lot of extra time to the cracking effort.

The quest continues...

Saturday 8 October 2016

CODZ cipher 13. kin_paper_torn straddle checkerboard?

I haven't actually played the game, I just like breaking codes.

Apparently there are some new CODZ ciphers, the transcripts can be seen here: http://www.callofdutyzombies.com/topic/183529-all-revelations-ciphers-texture-files/#comment-1759652.

This post will look at the following cipher:

@CB?>?>C@F?A?>CCCG??@C>?@C@C>???G=?C?=C@CGBC@CB?G@A
?=CB?C?G<?>CB@C=CB?<FG<?>C>C>CA?B?AGA?F?C?@CA?G=?B?>C@B?

TLDR

EDIT: I'm beginning to think that this is not the final decryption, that they have used a slightly modified method which results in some extra characters. There is too much decrypted for me to think that it is totally wrong, but I believe there is an extra step I'm missing somewhere in the middle. </EDIT>

The ciphertext is a straddle checkerboard. Converting the 10 symbols to digits, and reversing the string, we get (using @=0, C=1, B=2, ?=3, >=4, F=5, A=6, G=7, = is 8, <=9, Note that this mapping is arbitrary, and any other mapping would work just as well):

320143238736103135367632361414143975932181021439731321 83607321012710183138733341010341033711143635014343210

See the link for a description of the straddle checkerboard. Using this key:

  0123456789
  C.D.JYPEKW     
1 NXV.T.QFUL 
3 ZSOGIARVBH

The message decrypts to OCTOBERNSAREFORTTTHEYFOUNDTHESOURCEONVENUSBEGINNINGEXTRACTION. with spaces: OCTOBER NSA REPORT T THEY FOUND THE SOURCE ON VENUS BEGINNING EXTRACTION. There are likely a couple characters wrong still, or the slight mistakes are intentional.

My approach

There are 10 different characters, 107 in total. The counts for each character are as follows: A: 6, @: 11, C: 24, B: 9, G: 8, F: 3, =: 5, <: 3, ?: 28, >: 10. It so happens that 10 characters can be converted to digits, so we can look at existing ciphers that use digits e.g. the straddle checkerboard, GRANDPRE, MONOME-DINOME, MORBIT, NIHILIST SUBSTITUTION, POLLUX or TRIDIGITAL. There are probably also more.

Converted to digits it is: 0123434105363411173301430101433378313810172101237063812313 7934120181239579341414163236763531301637832341023. If you reverse it, you get 320143238736103135367632361414143975932181021439731321 83607321012710183138733341010341033711143635014343210. I'll be talking about the non reversed ciphertext, but I'll be applying all the steps to both just in case. Using the above ciphertext and this calculator, we can compute the most likely cipher algorithms, and it has concluded that it is most likely a nihilist substitution. Unfortunately, it can't be nihilist because it has an odd number of characters, and nihilist requires even. It can't be grandpre for the same reason. CryptoCrack (from here) can't break it as GRANDPRE, MORBIT, NIHILIST SUBSTITUTION, POLLUX or TRIDIGITAL.

That really only leaves straddle checkerboard as the last candidate from the ciphers I can think of. Assuming it is a straddle checkerboard, we first have to find to location of the blanks in the key. If we can find those locations, we can decrypt in with any letters and we will have a plain substitution cipher. There are 10 choose 2 = 45 ways of putting the first 2 blanks, and 20 choose 2 = 190 ways of putting the second. This means there are 45*190 = 8550 possible configurations of putting the blanks. Some of these will be invalid though, e.g. the key:

   0 1 2 3 4 5 6 7 8 9
   f k m   c p d   y e
3: h b i g q r o s a z
7: l u t j n w v x

can't have 38, 39, 78 or 79 in the ciphertext. If these numbers do exist in the ciphertext, then this is not a possible set of blanks. After trying to decrypt our ciphertext with all 8550 keys, only 3340 are actually valid blank positions. In addition to this, most of the blank positions result in identical outputs to other blank positions, so we can discard all the duplicate outputs, leaving only 45 substitution ciphers that we have to try.

After trying to decrypt the resulting 45 substitution ciphers from the forward cipher and another 45 from the reversed cipher, one of the decrypts, namely SALSYFWIRVWFESWLLLZFDHSOIBLZFRSOWAFSIKFIORYFTUIIUITFJLWVALUSI, can be decrypted to OCTOBERNSAREFORTTTHEYWOUNDTHESOURCEONMENUSBEGINNINGEXTRACTION. There are likely a couple characters wrong still, or the slight mistakes are intentional.

This is a bit of python code that will spit out the possible substitution ciphers for a given straddle checkerboard ciphertext (Please excuse the rough code, I wrote everything pretty quick):

I have actually since discovered there is a much shorter way of getting to the final candidates, but I'll leave this code here. To rank the final candidates, use index of coincidence, the correct one is almost always the one with the IC closest to 0.07.

import random
import sys
from itertools import combinations

ctext = '6909746723099383772753870703607230943837727093872638757438333832743772974928387272384175943874720383270'

''' decrypt a straddle checkerboard cipher ctext given a key
 key should look like e.g. 'fkm.cpd.yehbigqrosazlutjnwvx..' 
 it should be 30 chars in length, and have 4 dots. 
 2 dots in the first 10 chars, and 2 in the last 20.
 - if it returns 0 in the first result, the key was invalid
def scdecrypt(ctext,key):
    dotpos = []
    for i,k in enumerate(key): 
        if k == '.': dotpos.append(i)
    output = ""        
    if dotpos[0] > 9: return 0,output
    if dotpos[1] > 9: return 0,output
    if dotpos[2] < 10: return 0,output
    if dotpos[3] < 10: return 0,output
    flag = 0
    for cc in ctext:
        c = int(cc)
        if key[c] != '.' and flag == 0: 
            output += key[c]
        elif flag == 1: 
            if key[10+c] == '.': return 0,""
            output += key[10+c]
            flag = 0
        elif flag == 2: 
            if key[20+c] == '.': return 0,""        
            output += key[20+c]
            flag = 0
        elif c == dotpos[0]: flag = 1     
        elif c == dotpos[1]: flag = 2
        else: 
            return 0,output
    return 1,output

''' see if word and ct are possibly the same substitution cipher 
returns 0 in the first part of the result if they are not '''        
def canstart(ct,word):
    key = {}
    if len(word) > len(ct): return 0,key
    for i,char in enumerate(word):
        if ct[i] not in key: 
            if char not in key.values():           
                key[ct[i]] = char
            else: return 0,key
        elif key[ct[i]] == char: pass
        else: return 0,key
    return 1,key

# get a list of all the possible blank spot permutations
dp = []
a = combinations(range(10),2)
for i in a:
    b = combinations(range(20),2)
    for j in b:
        dotpos = list(i)
        dotpos.append(10+j[0])
        dotpos.append(10+j[1])
        dp.append(dotpos)

# we have all possible blank spot positions, try decrypting with each and see if we can discard any
count = 0
texts = []
for d in dp:
    key = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
    for pos in d: key.insert(pos,'.')
    
    bool,ptext = scdecrypt(ctext,key)
    if bool == 0: continue
    texts.append(ptext)
    count += 1

# now we have a heap of valid decrypts, discard duplicates
lst = []
for i in range(len(texts)):
    cs = False
    for j in range(len(lst)):
        if canstart(lst[j],texts[i])[0]: cs = True
    if not cs: lst.append(texts[i])

# print the unique possible decrypts, they must be solved as substitution ciphers
for i,j in enumerate(range(len(lst))):
    print lst[j]

Thursday 29 September 2016

The CODZ Mob of the dead ADFGX cipher

The ADFGX cipher from CODZ has been unsolved for quite a while, even with many people trying to break it. I thought I would document some of my progress on this page. Other Discussions:

There are of course other discussions around, but I'm not sure how much useful information is in them. Apparently the developers of the game have said that this was one of the early ciphers they made, and it may be too difficult to break. But we will have a go anyway.

EDIT: later thoughts

This paragraph has been added later, but I have decided that most of the analysis on this page is useless because I can't crack length 34 substitution ciphers using all the knowledge and code I have (which is mostly over here). I have made a couple of my own length 34 substitution ciphers, and I can't get anywhere even close to cracking them (If it were e.g. 100 characters it would be totally possible). So I might think about how to solve that problem. But until then, there is little point in the analysis below, because even without the ADFGX keyword steps it is impossible to solve (for me at the moment).

Some Observations

An ADFGX cipher with 68 characters means the plaintext has only 34 characters. Unfortunately this is close to the unicity distance for substitution ciphers, so it may be unbreakable from that perspective unless some extra information can be found. Most substitution cipher solvers can't break text this short as there are incorrect solutions that will score higher (using ngram scorers) than the true decryption. The ciphertext is as follows:

FFGXGD
GFFAGF
GGDDGF
FFXXFF
FDGFFG
FDGFFG
FDGGFF
FGFGAA
FXFXDX
XFXDGF
FAGGFF
    AF

There are two components to an ADFGX key: a keyword e.g. 'GERMAN' and a substitution key for the Polybius square. Much of the discussion on various forums is about finding the ADFGX keyword, however I think this is unneccessary. With a length 6 keyword there are only 6! = 720 possible keys, and only 48 possibilities if we assume the last two columns are the first 2 characters of the keyword (this is a reasonable assumption because they are dangling at the end like that).

This means we can decrypt the cipher using each of the possible 48 keyword permutations (this gives us 48 Polybius square ciphers), then use an arbitrary Polybius square key e.g. "ABCDEFGHIKLMNOPQRSTUVWXYZ" to get a 48 ciphers which are just a substitution cipher. From here, ideally, it would just take a good substitution cipher solver applied to each of the 48 candidates to determine which can be broken resulting in English text. It turns out that, due to symmetry of the polybius square, there are actually only 24 possible substitution ciphers, which makes our job easier. Unfortunately 34 characters is just too short for most substitution cipher crackers, or alternatively one of the previous assumptions is wrong.

Of course I am not the first to think of this, and many people have tried this exact procedure to no avail.

The Repeated Rows

There are two identical rows in the ciphertext: FDGFFG FDGFFG. This means, regardless of the keyword, there will be 6 letters in the plaintext that consists of 3 letters repeated e.g. THETHE. How often does this sort of thing occur in English? Quite often actually. I have a corpus of around 50 million English sentences from news websites and wikipedia which can be used to determine the frequency of this sort of occurrence.

It so happens that in every million english sentences, you would expect around 34,000 to have a repeated triple like this, consisting of around 1100 distinct sequences (i.e. many of the 34,000 are repeats of previously seen sequences). In total after 50 million sentences 6164 distinct 6-character sequences were found, occurring 1.89 million times in total. The most common are: THATHA (235167), ANDAND (176355), INGING (122567), THETHE (89908), REAREA (67514), NTONTO (42511), ETHETH (36691), SSESSE (31542), NDINDI (31259), ASSASS (29754), followed by many other rarer ones.

To utilise this information, it may be possible to fix the letters in the substitution key so that the repeated 3-letter combo is e.g. fixed to THATHA. This might make it easier for an n-gram scorer to identify the rest of the letters if some are known. I haven't tried this yet.

A Dictionary attack on the Polybius bit

Apparently past ciphers in the game have used keywords to generate keys for ciphers. This means we may be able to use a dictionary attack to generate polybius keys for decrypting the adfgx. This means we don't have to crack the substitution component, which would make things easier.

After doing this, the best decryption I could get from the 48 keyword permutations was VOGANHEMOWIGHHEHHENHEKANZEWESINHAC with a 4-gram score of -166. So this means either the polybius key was not a dictionary word (my dictionary has about 500k words in it), the adfgx keword is not length 6, or something other assumption we have made is wrong.

jimblog

Pages