## Thursday, 29 September 2016

The ADFGX cipher from CODZ has been unsolved for quite a while, even with many people trying to break it. I thought I would document some of my progress on this page. Other Discussions:

There are of course other discussions around, but I'm not sure how much useful information is in them. Apparently the developers of the game have said that this was one of the early ciphers they made, and it may be too difficult to break. But we will have a go anyway.

# EDIT: later thoughts

This paragraph has been added later, but I have decided that most of the analysis on this page is useless because I can't crack length 34 substitution ciphers using all the knowledge and code I have (which is mostly over here). I have made a couple of my own length 34 substitution ciphers, and I can't get anywhere even close to cracking them (If it were e.g. 100 characters it would be totally possible). So I might think about how to solve that problem. But until then, there is little point in the analysis below, because even without the ADFGX keyword steps it is impossible to solve (for me at the moment).

# Some Observations

An ADFGX cipher with 68 characters means the plaintext has only 34 characters. Unfortunately this is close to the unicity distance for substitution ciphers, so it may be unbreakable from that perspective unless some extra information can be found. Most substitution cipher solvers can't break text this short as there are incorrect solutions that will score higher (using ngram scorers) than the true decryption. The ciphertext is as follows:

FFGXGD
GFFAGF
GGDDGF
FFXXFF
FDGFFG
FDGFFG
FDGGFF
FGFGAA
FXFXDX
XFXDGF
FAGGFF
AF

There are two components to an ADFGX key: a keyword e.g. 'GERMAN' and a substitution key for the Polybius square. Much of the discussion on various forums is about finding the ADFGX keyword, however I think this is unneccessary. With a length 6 keyword there are only 6! = 720 possible keys, and only 48 possibilities if we assume the last two columns are the first 2 characters of the keyword (this is a reasonable assumption because they are dangling at the end like that).

This means we can decrypt the cipher using each of the possible 48 keyword permutations (this gives us 48 Polybius square ciphers), then use an arbitrary Polybius square key e.g. "ABCDEFGHIKLMNOPQRSTUVWXYZ" to get a 48 ciphers which are just a substitution cipher. From here, ideally, it would just take a good substitution cipher solver applied to each of the 48 candidates to determine which can be broken resulting in English text. It turns out that, due to symmetry of the polybius square, there are actually only 24 possible substitution ciphers, which makes our job easier. Unfortunately 34 characters is just too short for most substitution cipher crackers, or alternatively one of the previous assumptions is wrong.

Of course I am not the first to think of this, and many people have tried this exact procedure to no avail.

# The Repeated Rows

There are two identical rows in the ciphertext: FDGFFG FDGFFG. This means, regardless of the keyword, there will be 6 letters in the plaintext that consists of 3 letters repeated e.g. THETHE. How often does this sort of thing occur in English? Quite often actually. I have a corpus of around 50 million English sentences from news websites and wikipedia which can be used to determine the frequency of this sort of occurrence.

It so happens that in every million english sentences, you would expect around 34,000 to have a repeated triple like this, consisting of around 1100 distinct sequences (i.e. many of the 34,000 are repeats of previously seen sequences). In total after 50 million sentences 6164 distinct 6-character sequences were found, occurring 1.89 million times in total. The most common are: THATHA (235167), ANDAND (176355), INGING (122567), THETHE (89908), REAREA (67514), NTONTO (42511), ETHETH (36691), SSESSE (31542), NDINDI (31259), ASSASS (29754), followed by many other rarer ones.

To utilise this information, it may be possible to fix the letters in the substitution key so that the repeated 3-letter combo is e.g. fixed to THATHA. This might make it easier for an n-gram scorer to identify the rest of the letters if some are known. I haven't tried this yet.

# A Dictionary attack on the Polybius bit

Apparently past ciphers in the game have used keywords to generate keys for ciphers. This means we may be able to use a dictionary attack to generate polybius keys for decrypting the adfgx. This means we don't have to crack the substitution component, which would make things easier.

After doing this, the best decryption I could get from the 48 keyword permutations was VOGANHEMOWIGHHEHHENHEKANZEWESINHAC with a 4-gram score of -166. So this means either the polybius key was not a dictionary word (my dictionary has about 500k words in it), the adfgx keword is not length 6, or something other assumption we have made is wrong.