## Monday, 2 January 2017

### Magic Word Squares with python-constraint

This post will deal with the seemingly boring topic of magic word squares - I define these to be squares of letters where each row is a dictionary word, and each column is also a dictionary word. In my wordlist, there are 3,641 4-letter words (I talk about how I generated the wordlist below). How many possible 4 by 4 word squares do you think there are? Would you believe 90.6 million? I was rather surprised to discover there were so many. What about 8 by 8 word squares (given there are 12,317 8 letter words in the wordlist)? would you believe 0? It turns out it gets much harder to find word squares as you increase the side length.

I'll display some of the word squares and other stats before I display the python code I used to generate all these things further down.

# Some example word squares

 `a z on o wt o n` `f u z za l o eg a r s s n i t` `f r i z zl a r e eo d o r sp i n o ts i s s y` `z y g o t ey e l p e dg l u i n go p i a t et e n t e re d g e r s` `e s c a p e ss e a l a n tc a s e t t ea l e r t e rp a t t e r ne n t e r a ls t e r n l y`

# How many word squares are there?

I counted how many word squares there are for each side length. I was suprised how many there are for 4x4 squares, but then there are no 8x8 squares or higher. The No. of words is how many words of that particular length occur in the wordlist.

 Side Length No. of words No. of squares 3 1,014 2,607,858 4 3,641 90,669,454 5 6,683 10,991,077 6 9,559 68,518 7 11,960 98 8 12,317 0 9 11,181 0 10 8,760 didn't check 11 6,296 didn't check 12 4,082 didn't check 13 2,493 didn't check 14 1,326 didn't check 15 720 didn't check

# Building the wordlist

I identified two wordlists which I might use: the scrabble dictionary, and googles billion word corpus. Unfortunately, the scrabble dictionary has so many weird words I've never heard of before, and google's dataset has a heap of misspellings and gibberish. To get my wordlist I found the intersection of the two, with the hope being that the result will be just reasonably common proper English words.

# The code

I used python-constraint to solve the squares. A basic (very slow) implementation is shown below

`from constraint import *problem = Problem()s = 6problem.addVariables(range(s*s), list("abcdefghijklmnopqrstuvwxyz"))# build the wordlist from the wordlist text filewordlist = {}f = open("fewer_words.txt")for line in f:    w = line.split().lower()    wordlist[w] = 1# we need a function: true if word is in wordlist, false if not# this is characterwise checking, so a sequence of chars is provided, # we need to join them into a string before checking if they are in the listdef inlist(*word):     w = ''.join(word)    if w in wordlist: return True    return False   # add row and col constraintsfor i in range(0,s): problem.addConstraint(inlist,range(s*i,s*i+s))for i in range(0,s): problem.addConstraint(inlist,range(i,s*s,s))# pretty print the outputa = problem.getSolution()for i in range(s):    for j in range(s): print a[i*s+j],    print('')`

It can be sped up a lot by adding way more constraints on what bigrams, trigrams etc are possible. I apologise for the scruffy code, I was more concerned with getting it to work efficiently than making it look nice

`from constraint import *from timeit import default_timer as timerfrom itertools import combinationsproblem = Problem()s = 3problem.addVariables(list(range(s*s)), list("abcdefghijklmnopqrstuvwxyz"))wordlist = {}f = open("fewer_words.txt")for line in f:    w = line.split().lower()    if len(w) != s: continue     wordlist[w] = 1# build a list of monograms in each positionmono = []for j in range(s): mono.append({})for i in wordlist.keys():    for j in range(s):        mono[j][i[j]] = 1    # build a list of bigrams at each pair of positionsbgpos = list(combinations(list(range(s)),2))bg = {}for j in bgpos: bg[j] = {}for i in wordlist.keys():    for j in bgpos:        bg[j][i[j]+i[j]] = 1        tripos = list(combinations(list(range(s)),3))tri = {}for j in tripos: tri[j] = {}for i in wordlist.keys():    for j in tripos:        tri[j][i[j]+i[j]+i[j]] = 1qpos = list(combinations(list(range(s)),4))q = {}for j in qpos: q[j] = {}for i in wordlist.keys():    for j in qpos:        q[j][i[j]+i[j]+i[j]+i[j]] = 1fpos = list(combinations(list(range(s)),5))f = {}for j in fpos: f[j] = {}for i in wordlist.keys():    for j in fpos:        f[j][i[j]+i[j]+i[j]+i[j]+i[j]] = 1def opposite():    return lambda x,y: x==y    # we need a function: true if word is in wordlist, false if notdef inlist(*word):     w = ''.join(word)    if w in wordlist: return True    return False   def inmono(i):    return lambda x: x in mono[i]    def inbg(i,j):    return lambda *x: ''.join(x) in bg[(i,j)]      def intri(i,j,k):    return lambda *x: ''.join(x) in tri[(i,j,k)]   def inq(i,j,k,l):    return lambda *x: ''.join(x) in q[(i,j,k,l)]   def inf(i,j,k,l,m):    return lambda *x: ''.join(x) in f[(i,j,k,l,m)]       # add row and col constraintsfor i in range(0,s): problem.addConstraint(inlist,list(range(s*i,s*i+s)))for i in range(0,s): problem.addConstraint(inlist,list(range(i,s*s,s)))# add row monogram constraintsfor j in range(s):    for i in range(s):        problem.addConstraint(inmono(i),[j*s +i])        problem.addConstraint(inmono(i),[i*s +j])#print(bg.keys())for j in range(s):    for a,b in bgpos:        problem.addConstraint(inbg(a,b),[j*s +a,j*s +b])        problem.addConstraint(inbg(a,b),[a*s +j,b*s +j])for j in range(s):    for a,b,c in tripos:        problem.addConstraint(intri(a,b,c),[j*s +a,j*s +b,j*s +c])        problem.addConstraint(intri(a,b,c),[a*s +j,b*s +j,c*s +j])for j in range(s):    for a,b,c,d in qpos:        problem.addConstraint(inq(a,b,c,d),[j*s +a,j*s +b,j*s +c,j*s +d])        problem.addConstraint(inq(a,b,c,d),[a*s +j,b*s +j,c*s +j,d*s +j])for j in range(s):    for a,b,c,d,e in fpos:        problem.addConstraint(inf(a,b,c,d,e),[j*s +a,j*s +b,j*s +c,j*s +d,j*s +e])        problem.addConstraint(inf(a,b,c,d,e),[a*s +j,b*s +j,c*s +j,d*s +j,e*s +j])# pretty print the outputstart = timer()for count,a in enumerate(problem.getSolutionIter()):  if count % 1000 == 0: print 'count:',count     for i in range(s):    for j in range(s): print a[i*s+j],    print('')  print('')  end = timer()print("solution time = " + str(end-start))print countfor i in range(s):    for j in range(s): print a[i*s+j],    print ''print ''  `

1. 2. 