Pages

Monday, 2 January 2017

Magic Word Squares with python-constraint

This post will deal with the seemingly boring topic of magic word squares - I define these to be squares of letters where each row is a dictionary word, and each column is also a dictionary word. In my wordlist, there are 3,641 4-letter words (I talk about how I generated the wordlist below). How many possible 4 by 4 word squares do you think there are? Would you believe 90.6 million? I was rather surprised to discover there were so many. What about 8 by 8 word squares (given there are 12,317 8 letter words in the wordlist)? would you believe 0? It turns out it gets much harder to find word squares as you increase the side length.

I'll display some of the word squares and other stats before I display the python code I used to generate all these things further down.

Some example word squares

a z o
n o w
t o n
f u z z
a l o e
g a r s    
s n i t
f r i z z
l a r e e
o d o r s
p i n o t
s i s s y
z y g o t e
y e l p e d
g l u i n g
o p i a t e
t e n t e r
e d g e r s
e s c a p e s
s e a l a n t
c a s e t t e
a l e r t e r
p a t t e r n
e n t e r a l
s t e r n l y

How many word squares are there?

I counted how many word squares there are for each side length. I was suprised how many there are for 4x4 squares, but then there are no 8x8 squares or higher. The No. of words is how many words of that particular length occur in the wordlist.

Side LengthNo. of wordsNo. of squares
31,0142,607,858
43,64190,669,454
56,68310,991,077
69,55968,518
711,96098
812,3170
911,1810
108,760didn't check
116,296didn't check
124,082didn't check
132,493didn't check
141,326didn't check
15720didn't check

Building the wordlist

I identified two wordlists which I might use: the scrabble dictionary, and googles billion word corpus. Unfortunately, the scrabble dictionary has so many weird words I've never heard of before, and google's dataset has a heap of misspellings and gibberish. To get my wordlist I found the intersection of the two, with the hope being that the result will be just reasonably common proper English words.

The code

I used python-constraint to solve the squares. A basic (very slow) implementation is shown below

from constraint import *
problem = Problem()
s = 6
problem.addVariables(range(s*s), list("abcdefghijklmnopqrstuvwxyz"))

# build the wordlist from the wordlist text file
wordlist = {}
f = open("fewer_words.txt")
for line in f:
    w = line.split()[0].lower()
    wordlist[w] = 1

# we need a function: true if word is in wordlist, false if not
# this is characterwise checking, so a sequence of chars is provided, 
# we need to join them into a string before checking if they are in the list
def inlist(*word): 
    w = ''.join(word)
    if w in wordlist: return True
    return False
   
# add row and col constraints
for i in range(0,s): problem.addConstraint(inlist,range(s*i,s*i+s))
for i in range(0,s): problem.addConstraint(inlist,range(i,s*s,s))

# pretty print the output
a = problem.getSolution()
for i in range(s):
    for j in range(s): print a[i*s+j],
    print('')

It can be sped up a lot by adding way more constraints on what bigrams, trigrams etc are possible. I apologise for the scruffy code, I was more concerned with getting it to work efficiently than making it look nice

from constraint import *
from timeit import default_timer as timer
from itertools import combinations

problem = Problem()
s = 3
problem.addVariables(list(range(s*s)), list("abcdefghijklmnopqrstuvwxyz"))

wordlist = {}
f = open("fewer_words.txt")
for line in f:
    w = line.split()[0].lower()
    if len(w) != s: continue 
    wordlist[w] = 1

# build a list of monograms in each position
mono = []
for j in range(s): mono.append({})
for i in wordlist.keys():
    for j in range(s):
        mono[j][i[j]] = 1
    
# build a list of bigrams at each pair of positions
bgpos = list(combinations(list(range(s)),2))
bg = {}
for j in bgpos: bg[j] = {}
for i in wordlist.keys():
    for j in bgpos:
        bg[j][i[j[0]]+i[j[1]]] = 1
        
tripos = list(combinations(list(range(s)),3))
tri = {}
for j in tripos: tri[j] = {}
for i in wordlist.keys():
    for j in tripos:
        tri[j][i[j[0]]+i[j[1]]+i[j[2]]] = 1

qpos = list(combinations(list(range(s)),4))
q = {}
for j in qpos: q[j] = {}
for i in wordlist.keys():
    for j in qpos:
        q[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]] = 1

fpos = list(combinations(list(range(s)),5))
f = {}
for j in fpos: f[j] = {}
for i in wordlist.keys():
    for j in fpos:
        f[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]+i[j[4]]] = 1

def opposite():
    return lambda x,y: x==y
    
# we need a function: true if word is in wordlist, false if not
def inlist(*word): 
    w = ''.join(word)
    if w in wordlist: return True
    return False
   
def inmono(i):
    return lambda x: x in mono[i]
    
def inbg(i,j):
    return lambda *x: ''.join(x) in bg[(i,j)]   
   
def intri(i,j,k):
    return lambda *x: ''.join(x) in tri[(i,j,k)]   

def inq(i,j,k,l):
    return lambda *x: ''.join(x) in q[(i,j,k,l)]   

def inf(i,j,k,l,m):
    return lambda *x: ''.join(x) in f[(i,j,k,l,m)]   
    
# add row and col constraints
for i in range(0,s): problem.addConstraint(inlist,list(range(s*i,s*i+s)))
for i in range(0,s): problem.addConstraint(inlist,list(range(i,s*s,s)))

# add row monogram constraints
for j in range(s):
    for i in range(s):
        problem.addConstraint(inmono(i),[j*s +i])
        problem.addConstraint(inmono(i),[i*s +j])

#print(bg.keys())
for j in range(s):
    for a,b in bgpos:
        problem.addConstraint(inbg(a,b),[j*s +a,j*s +b])
        problem.addConstraint(inbg(a,b),[a*s +j,b*s +j])

for j in range(s):
    for a,b,c in tripos:
        problem.addConstraint(intri(a,b,c),[j*s +a,j*s +b,j*s +c])
        problem.addConstraint(intri(a,b,c),[a*s +j,b*s +j,c*s +j])

for j in range(s):
    for a,b,c,d in qpos:
        problem.addConstraint(inq(a,b,c,d),[j*s +a,j*s +b,j*s +c,j*s +d])
        problem.addConstraint(inq(a,b,c,d),[a*s +j,b*s +j,c*s +j,d*s +j])

for j in range(s):
    for a,b,c,d,e in fpos:
        problem.addConstraint(inf(a,b,c,d,e),[j*s +a,j*s +b,j*s +c,j*s +d,j*s +e])
        problem.addConstraint(inf(a,b,c,d,e),[a*s +j,b*s +j,c*s +j,d*s +j,e*s +j])

# pretty print the output
start = timer()
for count,a in enumerate(problem.getSolutionIter()):
  if count % 1000 == 0: print 'count:',count   
  for i in range(s):
    for j in range(s): print a[i*s+j],
    print('')
  print('')  
end = timer()

print("solution time = " + str(end-start))
print count

for i in range(s):
    for j in range(s): print a[i*s+j],
    print ''
print ''  

No comments:

Post a Comment