## Tuesday, 31 January 2017

### Generating Gray codes with python

Gray codes are often used in positioning systems, see the wiki article for more details. I want to generate gray codes with python, and I'd like to generate all possible gray codes for a specific number of positions and number of bits. The maximum number of positions for a gray code is 2^numbits i.e. a 4 bit gray code can have at most 16 positions. It can also have fewer. If you want the gray code to be circular i.e. the first and last code have hamming distance 1, then there needs to be an even number of positions. Odd is fine if you don't care if it is circular.

There are of course algorithms for generating gray codes, but a constraint solver works really well, and also allows you to generate all possible gray codes for a specific number of positions. The python code below is for generating gray codes using python-constraint:

from constraint import *from itertools import productnumbits = 4numpos = 16# this generates a list of all binary vectors of numbits bitsvecs = list(product(*(((1,0),)*numbits)))problem = Problem()# we will have numpos variables, i.e. one var for each code,# each var is an index into vecs, has 2^numbits possible valsproblem.addVariables(list(range(numpos)), list(range(len(vecs))))# vecs holds our binary vectors. # we want True if hamming distance between vecs[a] and vecs[b] is 1# False otherwise. This will be our constraint. def hammingDist1(a,b):    hamming = sum([vecs[a][i]^vecs[b][i] for i in range(numbits)])    if hamming == 1: return True    return False# add the important constraintsfor i in range(numpos):    j = (i+1)%numpos    problem.addConstraint(hammingDist1,[i,j])# make sure all the vectors are different    problem.addConstraint(AllDifferentConstraint())# print out the first solution we come acrosssol = problem.getSolution()for i,j in enumerate(sol):    print(vecs[sol[i]])

The main thing to note is that we add constraints so that each code has a hamming distance of 1 with its neighbour, and also that all the codes are diferent.

An example run looks like this (for a 4-bit, 16 position code):

C:\Users\james\Desktop>python gray.py(0, 0, 0, 0)(0, 0, 0, 1)(0, 0, 1, 1)(0, 0, 1, 0)(0, 1, 1, 0)(0, 1, 1, 1)(0, 1, 0, 1)(1, 1, 0, 1)(1, 0, 0, 1)(1, 0, 0, 0)(1, 0, 1, 0)(1, 0, 1, 1)(1, 1, 1, 1)(1, 1, 1, 0)(1, 1, 0, 0)(0, 1, 0, 0)

## Monday, 16 January 2017

### Latex Dictionary Template

I recently needed a latex dictionary template, there are a couple around on the internet, but they are all under non-commercial licences so I couldn't use them. Fortunately it is pretty easy to make one yourself, I have provided one here. Consider it to be under the wtfpl licence.

A short example of what the definitions look like: %Copyright 2017 James Lyons%This work is free. You can redistribute it and/or modify it under the%terms of the Do What The Fuck You Want To Public License, Version 2,%as published by Sam Hocevar. See http://www.wtfpl.net/ for more details.\documentclass[twoside]{book}\usepackage[utf8]{inputenc}\usepackage{multicol}\usepackage{fancyhdr} \usepackage[bindingoffset=0.2in,margin=0.75in,paperwidth=5.25in,paperheight=8in]{geometry}% set the headers so that first mark is always on left, second on right% this is so that each word is put in the header correctly \pagestyle{fancy} \fancyhead[LO,LE]{\rightmark}\fancyhead[RE,RO]{\leftmark}% set the font\renewcommand*\rmdefault{bch}\newcommand{\comment}{}% remove chapter numbering\usepackage{titlesec}\titleformat{\chapter}  {\Large\bfseries} % format  {}                % label  {0pt}             % sep  {\huge}           % before-code% remove section numbering\setcounter{secnumdepth}{0}% this is how we define the actual dictionary items. markboth is for % setting the page headers so that first and last words appear there.\newcommand{\ditem}{\item[#1] \markboth{\footnotesize \textbf{#1}}{\footnotesize \textbf{#1}}}\begin{document}%%%%%%%%%%%%%%%%%%%%%%%% title page\begin{center}\thispagestyle{empty}\vbox{\vspace{5cm}{\huge The Dictionary Title}}\vfill{\large Author}\vfill\end{center}\clearpage%%%%%%%%%%%%%%%%%%%%%%%% table of contents\tableofcontents%%%%%%%%%%%%%%%%%%%%%%%% Preface\chapter{Preface}This is the preface.\chapter{The Dictionary}% we'd like the text size to be a bit smaller for the definitions\footnotesize% we'd like two columns of definitions\begin{multicols}{2}\begin{description}\section{A}\ditem{Abandon} \textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abandoned} \textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2\ditem{Abide} \textit{-v.} definition 1 \textit{-n.} definition 2\end{description}\end{multicols} \end{document}

## Sunday, 15 January 2017

### Creating crosswords with python-constraint [part 2]

In the previous post I was generating the patterns of white and black squares. In this post I'd like to actually fill the patterns with words so that everything matches up. This sort of problem is very well suited to a constraint satisfaction solver, as it can try many possibilities until something is found that works. The trick is specifying the constraints in an efficient manner. My first solution was to implement it the same way as the magic word squares, but this was really inefficient and there is a much better way.

I put the code for this on github, it's a bit messy but works: https://github.com/jameslyons/crossword_gen.

The trick is you don't need to represent all the different squares in the crossword as separate variables and put the constraints on those, your actual variables will be indexes into the word list, and the constraints will be whether e.g. character 4 of word 1 is the same as character 2 of word 7. These constraints will obviously be different for every crossword and will have to be generated each time, but this isn't too hard.

The inputs from the previous step are hex strings e.g.

3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg

This can be converted to a binary string by replacing e.g. 3 with '0011', same as decoding hex anywhere. The function I use is down the bottom of the previous post. This gives a binary string that can be represented as the white and black squares of the crossword.

# Example run

The code takes as argument the hex representation of the pattern. It first prints a representation of the crossword with 'X's where the letters go, then once it gets solved it outputs the solved crossword, along with the list of words that got used. The dictionary I am using has some weird words in it, so you need to pay some attention to that if you want nice crosswords (I recommend using something like the 3of6game.txt from the 12dicts wordlist, it is specifically for games). The final thing that gets printed is the crossword represented as a string, this is passed to some typesetting code I have that produces nice pdfs of the crosswords.

>pypy cw_fill.py BEFB4516BFAFAAF8FE27573BFBDD5DEFEE75723F8FAAFAFEB4516FBEhbuilding word lists...X   X X X X X   X X X X X   XX   X       X   X       X   XX   X   X X X X X X X   X   XX X X X   X   X   X   X X X XX       X X X X X X X       X    X X X   X   X   X X XX X X   X X X X X X X   X X XX   X X X   X   X   X X X   XX X X   X X X X X X X   X X X    X X X   X   X   X X XX       X X X X X X X       XX X X X   X   X   X   X X X XX   X   X X X X X X X   X   XX   X       X   X       X   XX   X X X X X   X X X X X   Xstarting...count: 0s . c r u m b . r e m a n . bw . r . . . r . i . . . o . ra . a . p h o e b u s . s . im o w n . e . l . r . g y m si . . . c r i m i n e . . . k. . p o o . n . n . v e g . .a v o . i n f a u n a . i l ks . s u n . l . t . d i g . ip h i . a l a n i n e . o f t. . t a g . m . l . r a t . .i . . . e y e l e s s . . . om o t s . e . o . p . w a t ti . o . s t o p g a p . m . td . r . . . a . e . . . i . ae . c l a c k . l u l l s . r['swami', 'asp', 'imide', 'craw', 'posit', 'torc', 'coinage', 'her', 'yet', 'bro', 'inflame', 'oak', 'elm', 'lop', 'rib', 'inutile', 'gel', 'urn', 'spa', 'evaders', 'nosy', 'gigot', 'amis', 'brisk', 'kit', 'ottar', 'crumb', 'reman', 'phoebus', 'mown', 'gyms', 'crimine', 'poo', 'veg', 'avo', 'infauna', 'ilk', 'sun', 'dig', 'phi', 'alanine', 'oft', 'tag', 'rat', 'eyeless', 'mots', 'watt', 'stopgap', 'clack', 'lulls']solution time = 1.05939997405s.crumb.reman.bw.r...r.i...o.ra.a.phoebus.s.imown.e.l.r.gymsi...crimine...k..poo.n.n.veg..avo.infauna.ilks.sun.l.t.dig.iphi.alanine.oft..tag.m.l.rat..i...eyeless...omots.e.o.p.watti.o.stopgap.m.td.r...a.e...i.ae.clack.lulls.r

# The code

This takes the hex string output from the pattern generation step as an argument. It uses a wordlist called 'fewerwords.txt'. Each time it is run it should give different results because the word indexes are shuffled each time. Sometimes it will take a long time to find a solution, I suggest just killing it and running it again. Most of the time this will get a new starting point that allows it to be solved very quickly, though some patterns are just harder to solve than others.

from __future__ import print_functionfrom constraint import *from timeit import default_timer as timerfrom itertools import combinationsfrom math import sqrtfrom test_bin2hex import hex2binimport sys''' to run from pypypypy cw_fill.py 3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg'''# this program will take a crossword pattern and try to fill it with wordsif len(sys.argv) < 2: print('supply pattern as argument')else:    pattern = hex2bin(sys.argv)     pattern = [int(i) for i in pattern]s = int(sqrt(len(pattern)))print('building word lists...')# extract the word positions from the patternwordlocs = []hwordlocs = []vwordlocs = []for i in range(s):    hword =  []    vword = []    for j in range(s):        if pattern[i*s + j] == 0: print(' ',end=' ')        else: print('X',end=' ')        if pattern[i*s + j] == 1:            hword.append(i*s + j)        else:            if len(hword) > 1: hwordlocs.append(hword)            hword = []                         if pattern[j*s + i] == 1:            vword.append(j*s + i)        else:            if len(vword) > 1: vwordlocs.append(vword)            vword = []                if j == s-1:            if len(vword) > 1 and vword not in vwordlocs: vwordlocs.append(vword)            if len(hword) > 1 and hword not in hwordlocs: hwordlocs.append(hword)                print('')wordlocs.extend(vwordlocs)wordlocs.extend(hwordlocs)# now that we have the words, add constraints so each word is in the dictionarymaxL = 10wordlist = []f = open("fewwords.txt")for line in f:    w = line.split().lower()    wordlist.append(w)wordlen = []olen = 0for c,word in enumerate(wordlist):    if olen < len(word):        wordlen.append(c)    olen = len(word)problem = Problem()from random import shufflefor c,w in enumerate(wordlocs):    L = len(w)-3    r = list(range(wordlen[L],wordlen[L+1]))    shuffle(r)    problem.addVariables([c], list(r))problem.addConstraint(AllDifferentConstraint())def eq(a,b):    def temp(c1,c2):        word1 = wordlist[c1]        word2 = wordlist[c2]        if word1[a]==word2[b]: return True        return False    return temp        for c1,word1 in enumerate(wordlocs):    for c2,word2 in enumerate(wordlocs):        if word1==word2: continue        intsc = list(set(word1).intersection(set(word2)))        if len(intsc) > 0:            a = word1.index(intsc)            b = word2.index(intsc)            problem.addConstraint(eq(a,b),[c1,c2])            print('starting...')        def getwords(solution):    listq = []    for locs in wordlocs:        word = ''.join([solution[loc] for loc in locs])        listq.append(word)            return listq              # pretty print the outputstart = timer()b = ''for count,a in enumerate(problem.getSolutionIter()):      print('count: '+str(count))    sol = ['.' for i in range(s*s)]    for key,val in a.iteritems():        for c,char in enumerate(wordlist[val]):            sol[wordlocs[key][c]] = char          b=sol    for i in range(s):        for j in range(s):             print(sol[i*s+j],end=' ')        print('')    print('')      # get the words we have used    zz = getwords(sol)    print(zz)    breakend = timer()print("solution time = " + str(end-start))print(''.join([str(b[i]) for i in range(s*s)]))

# Intro

I put the code for this on github, it's a bit messy but works: https://github.com/jameslyons/crossword_gen.

In previous posts I have been playing with python-constraint, and building crosswords is something that interests me for some reason. I decided to do it in a two step process: step 1 generate the patterns of white and black squares, step 2 fill the patterns with letters. This post will deal with step 1 only, the next post will deal with step 2.

Generating the patterns is well suited to using a constraint solver since you can just specify the things you'd like to be true of the final result, then ask the constraint solver to do it for you. In this case we have e.g. 15 by 15 binary variables, we choose the constraints so that the final pattern looks like a crossword puzzle.

# What do pretty patterns look like

I am interested in automatically generating patterns that look like nice crossword patterns e.g.

Some of the things I'd like to be true of the patterns I generate:

1. I'd like 4-fold symmetry, because I think it looks good.
2. No 2x2 blocks of white squares.
3. No 2x2 blocks of black squares.
4. No length 1 words.
5. No length 2 words.
6. No words longer than MAXLEN.
7. No sequences of black squares longer than 3.
8. White squares must be fully connected.

Some of these conditions are pretty easy to implement with python-constraint, the fully connected constraint I can just apply to the final solutions and discard them at that point. By fully connected I mean you can get from one square to every other square in the pattern, so you can't get disjoint islands.

# The implementation

Examples of what the program outputs:

3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEgX X . . . . . X . . . . . X X. . . X X X . X . X X X . . .. X . X . . . . . . . X . X .. . . . X . X . X . X . . . .. X X X . . . . . . . X X X .. X . . . X . X . X . . . X .. . . X . . . . . . . X . . .X X . . . X . X . X . . . X X. . . X . . . . . . . X . . .. X . . . X . X . X . . . X .. X X X . . . . . . . X X X .. . . . X . X . X . X . . . .. X . X . . . . . . . X . X .. . . X X X . X . X X X . . .X X . . . . . X . . . . . X X

The hex string is the binary to hex encoding of the pattern, 'X's are zeros, '.'s are ones. The hex string will be passed to step 2 which will fill in the words using a wordlist.

It's pretty easy to generate a few million example patterns with this program (if you are willing to wait a day), which is what I suggest doing. Once you have a heap of patterns you can further filter them by whatever criteria you want e.g. no more than X 3-letter words, no more than X% black squares etc. Once you have the filtered list, you can try filling them with words.

from __future__ import print_functionfrom constraint import *from itertools import combinationsfrom math import sqrt,floor# this program will create crossword patterns, but not fill it with words# i.e. just patterns of zeros and ones for the black and white squares# tests if a solution is fully connected i.e. can every point be reached from# any random point. True if fullyconnected, false if disjoint parts.# Could easily be made more efficient, but this works fine.def isfullyconnected(solution):    L = int(sqrt(len(solution)))    # get list of all non-zero points    plist = []    for i in range(L):        for j in range(L):            if solution[i*L + j]: plist.append((i,j))        # start top right corner, try to get to every other point    reached = {plist:1}    rlen = len(reached)    oldrlen = 0    while rlen != oldrlen:      temp = list(reached.keys())      for i,j in temp:        if (i+1,j) in plist: reached[(i+1,j)]=1        if (i-1,j) in plist: reached[(i-1,j)]=1        if (i,j+1) in plist: reached[(i,j+1)]=1        if (i,j-1) in plist: reached[(i,j-1)]=1      oldrlen = rlen      rlen = len(reached)    # if rlen is not eq number of all points, we couldn't reach some points    if rlen < len(plist): return False    return True        from random import randint    def getPattern(sidelen=15,maxlen=9):    problem = Problem()    s = sidelen    maxWordLen = maxlen    problem.addVariables(list(range(s*s)), [0,1])      # ensure there are no 2x2 blocks of ones or zeros    def no4x4block(*x):        if sum(x)==0: return False        if sum(x)==len(x): return False        else: return True    for i in range(s-1):        for j in range(s-1):            problem.addConstraint(no4x4block,[i*s+j,(i*s+j)+1,(i*s+j)+s,(i*s+j)+s+1])        #ensure matrix is symmetric      def eq(a,b):          return a==b    for i in range(s):        for j in range(s):            problem.addConstraint(eq,[i*s+j,(s-i-1)*s+j])            problem.addConstraint(eq,[i*s+j,i*s+(s-j-1)])            problem.addConstraint(eq,[i*s+j,(s-i-1)*s+(s-j-1)])           #ensure no words longer than maxWordLen    def maxlen(*x):        if sum(x) == len(x): return False        else: return True    seq = list(range(maxWordLen+1))    for i in range(s - maxWordLen):        for j in range(s):            problem.addConstraint(maxlen,[(k+i)+j*s for k in seq])            problem.addConstraint(maxlen,[(k+i)*s+j for k in seq])        # no two letter words    def no2letter(*x):        if x == (0,1,1,0): return False        else: return True        seq = list(range(4))    for i in range(s - 3):        for j in range(s):            problem.addConstraint(no2letter,[(k+i)+j*s for k in seq])            problem.addConstraint(no2letter,[(k+i)*s+j for k in seq])    # no two letter words around the edge either    def no2letterEdge(*x):        if x == (1,1,0): return False        else: return True        seq = list(range(4))    for i in range(s):        problem.addConstraint(no2letterEdge,[i,i+s,i+2*s])        problem.addConstraint(no2letterEdge,[i*s,i*s+1,i*s+2])        problem.addConstraint(no2letterEdge,[i*s+s-1,i*s+s-2,i*s+s-3])        problem.addConstraint(no2letterEdge,[i+s*(s-1),i+s*(s-2),i+s*(s-3)])        # no one letter squares    def no1letter(*x):            if sum(x) == 0: return True        if sum(x[1:]) == 0: return False        return True        seq = list(range(4))    for i in range(s):      for j in range(s):        vals = [i*s+j]        if i-1 >= 0: vals.append((i-1)*s+j)         if i+1 < s: vals.append((i+1)*s+j)         if j-1 >= 0: vals.append(i*s+(j-1))         if j+1 < s: vals.append(i*s+(j+1))         problem.addConstraint(no1letter,vals)     # no more than three 0's in a row    def nothreeblank(*x):        if sum(x) == 0: return False        return True        seq = list(range(4))    for i in range(s - 3):        for j in range(s):            problem.addConstraint(nothreeblank,[(k+i)+j*s for k in seq])            problem.addConstraint(nothreeblank,[(k+i)*s+j for k in seq])       for sol in problem.getSolutionIter():          if not isfullyconnected(sol): continue        yield(sol)    from test_bin2hex import bin2heximport sysprint(len(sys.argv))if len(sys.argv)<2: sidelen = 11else: sidelen = int(sys.argv)print('using sidelen',sidelen)f = open('pattern'+str(sidelen)+'.txt','w')for count,sol in enumerate(getPattern(sidelen=sidelen,maxlen=8)):    f.write(bin2hex(''.join([str(sol[a]) for a in range(sidelen*sidelen)]))+'\n')    if count % 100 == 0:       print('count: '+str(count))      print(bin2hex(''.join([str(sol[a]) for a in range(sidelen*sidelen)])))      for i in range(sidelen):        for j in range(sidelen):             if sol[i*sidelen+j]==1: print('.',end=' ')            else: print('X',end=' ')        print('')      print('')    #break

The hex string is generated using the following python script (test_bin2hex.py). The extra letters are so we can encode binary strings that aren't necessarily divisible by 4.

bin = ['0000','0001','0010','0011','0100','0101','0110','0111','1000','1001','1010','1011','1100','1101','1110','1111',       '0','1','00','01','10','11','000','001','010','011','100','101','110','111']hex = "0123456789ABCDEFghijklmnopqrst"def bin2hex(binstr):    map = dict(zip(bin,hex))    ret = ''    for i in range(0,len(binstr),4):        sub = binstr[i:i+4]        ret += map[sub]    return ret    def hex2bin(hexstr):    map = dict(zip(hex,bin))    ret = ''    for i in hexstr:        ret += map[i]    return ret

## Monday, 2 January 2017

### Magic Word Squares with python-constraint

This post will deal with the seemingly boring topic of magic word squares - I define these to be squares of letters where each row is a dictionary word, and each column is also a dictionary word. In my wordlist, there are 3,641 4-letter words (I talk about how I generated the wordlist below). How many possible 4 by 4 word squares do you think there are? Would you believe 90.6 million? I was rather surprised to discover there were so many. What about 8 by 8 word squares (given there are 12,317 8 letter words in the wordlist)? would you believe 0? It turns out it gets much harder to find word squares as you increase the side length.

I'll display some of the word squares and other stats before I display the python code I used to generate all these things further down.

# Some example word squares

 a z on o wt o n f u z za l o eg a r s s n i t f r i z zl a r e eo d o r sp i n o ts i s s y z y g o t ey e l p e dg l u i n go p i a t et e n t e re d g e r s e s c a p e ss e a l a n tc a s e t t ea l e r t e rp a t t e r ne n t e r a ls t e r n l y

# How many word squares are there?

I counted how many word squares there are for each side length. I was suprised how many there are for 4x4 squares, but then there are no 8x8 squares or higher. The No. of words is how many words of that particular length occur in the wordlist.

 Side Length No. of words No. of squares 3 1,014 2,607,858 4 3,641 90,669,454 5 6,683 10,991,077 6 9,559 68,518 7 11,960 98 8 12,317 0 9 11,181 0 10 8,760 didn't check 11 6,296 didn't check 12 4,082 didn't check 13 2,493 didn't check 14 1,326 didn't check 15 720 didn't check

# Building the wordlist

I identified two wordlists which I might use: the scrabble dictionary, and googles billion word corpus. Unfortunately, the scrabble dictionary has so many weird words I've never heard of before, and google's dataset has a heap of misspellings and gibberish. To get my wordlist I found the intersection of the two, with the hope being that the result will be just reasonably common proper English words.

# The code

I used python-constraint to solve the squares. A basic (very slow) implementation is shown below

from constraint import *problem = Problem()s = 6problem.addVariables(range(s*s), list("abcdefghijklmnopqrstuvwxyz"))# build the wordlist from the wordlist text filewordlist = {}f = open("fewer_words.txt")for line in f:    w = line.split().lower()    wordlist[w] = 1# we need a function: true if word is in wordlist, false if not# this is characterwise checking, so a sequence of chars is provided, # we need to join them into a string before checking if they are in the listdef inlist(*word):     w = ''.join(word)    if w in wordlist: return True    return False   # add row and col constraintsfor i in range(0,s): problem.addConstraint(inlist,range(s*i,s*i+s))for i in range(0,s): problem.addConstraint(inlist,range(i,s*s,s))# pretty print the outputa = problem.getSolution()for i in range(s):    for j in range(s): print a[i*s+j],    print('')

It can be sped up a lot by adding way more constraints on what bigrams, trigrams etc are possible. I apologise for the scruffy code, I was more concerned with getting it to work efficiently than making it look nice

from constraint import *from timeit import default_timer as timerfrom itertools import combinationsproblem = Problem()s = 3problem.addVariables(list(range(s*s)), list("abcdefghijklmnopqrstuvwxyz"))wordlist = {}f = open("fewer_words.txt")for line in f:    w = line.split().lower()    if len(w) != s: continue     wordlist[w] = 1# build a list of monograms in each positionmono = []for j in range(s): mono.append({})for i in wordlist.keys():    for j in range(s):        mono[j][i[j]] = 1    # build a list of bigrams at each pair of positionsbgpos = list(combinations(list(range(s)),2))bg = {}for j in bgpos: bg[j] = {}for i in wordlist.keys():    for j in bgpos:        bg[j][i[j]+i[j]] = 1        tripos = list(combinations(list(range(s)),3))tri = {}for j in tripos: tri[j] = {}for i in wordlist.keys():    for j in tripos:        tri[j][i[j]+i[j]+i[j]] = 1qpos = list(combinations(list(range(s)),4))q = {}for j in qpos: q[j] = {}for i in wordlist.keys():    for j in qpos:        q[j][i[j]+i[j]+i[j]+i[j]] = 1fpos = list(combinations(list(range(s)),5))f = {}for j in fpos: f[j] = {}for i in wordlist.keys():    for j in fpos:        f[j][i[j]+i[j]+i[j]+i[j]+i[j]] = 1def opposite():    return lambda x,y: x==y    # we need a function: true if word is in wordlist, false if notdef inlist(*word):     w = ''.join(word)    if w in wordlist: return True    return False   def inmono(i):    return lambda x: x in mono[i]    def inbg(i,j):    return lambda *x: ''.join(x) in bg[(i,j)]      def intri(i,j,k):    return lambda *x: ''.join(x) in tri[(i,j,k)]   def inq(i,j,k,l):    return lambda *x: ''.join(x) in q[(i,j,k,l)]   def inf(i,j,k,l,m):    return lambda *x: ''.join(x) in f[(i,j,k,l,m)]       # add row and col constraintsfor i in range(0,s): problem.addConstraint(inlist,list(range(s*i,s*i+s)))for i in range(0,s): problem.addConstraint(inlist,list(range(i,s*s,s)))# add row monogram constraintsfor j in range(s):    for i in range(s):        problem.addConstraint(inmono(i),[j*s +i])        problem.addConstraint(inmono(i),[i*s +j])#print(bg.keys())for j in range(s):    for a,b in bgpos:        problem.addConstraint(inbg(a,b),[j*s +a,j*s +b])        problem.addConstraint(inbg(a,b),[a*s +j,b*s +j])for j in range(s):    for a,b,c in tripos:        problem.addConstraint(intri(a,b,c),[j*s +a,j*s +b,j*s +c])        problem.addConstraint(intri(a,b,c),[a*s +j,b*s +j,c*s +j])for j in range(s):    for a,b,c,d in qpos:        problem.addConstraint(inq(a,b,c,d),[j*s +a,j*s +b,j*s +c,j*s +d])        problem.addConstraint(inq(a,b,c,d),[a*s +j,b*s +j,c*s +j,d*s +j])for j in range(s):    for a,b,c,d,e in fpos:        problem.addConstraint(inf(a,b,c,d,e),[j*s +a,j*s +b,j*s +c,j*s +d,j*s +e])        problem.addConstraint(inf(a,b,c,d,e),[a*s +j,b*s +j,c*s +j,d*s +j,e*s +j])# pretty print the outputstart = timer()for count,a in enumerate(problem.getSolutionIter()):  if count % 1000 == 0: print 'count:',count     for i in range(s):    for j in range(s): print a[i*s+j],    print('')  print('')  end = timer()print("solution time = " + str(end-start))print countfor i in range(s):    for j in range(s): print a[i*s+j],    print ''print ''