This post will deal with the seemingly boring topic of magic word squares  I define these to be squares of letters where each row is a dictionary word, and each column is also a dictionary word. In my wordlist, there are 3,641 4letter words (I talk about how I generated the wordlist below). How many possible 4 by 4 word squares do you think there are? Would you believe 90.6 million? I was rather surprised to discover there were so many. What about 8 by 8 word squares (given there are 12,317 8 letter words in the wordlist)? would you believe 0? It turns out it gets much harder to find word squares as you increase the side length.
I'll display some of the word squares and other stats before I display the python code I used to generate all these things further down.
Some example word squares





How many word squares are there?
I counted how many word squares there are for each side length. I was suprised how many there are for 4x4 squares, but then there are no 8x8 squares or higher. The No. of words is how many words of that particular length occur in the wordlist.
Side Length  No. of words  No. of squares 
3  1,014  2,607,858 
4  3,641  90,669,454 
5  6,683  10,991,077 
6  9,559  68,518 
7  11,960  98 
8  12,317  0 
9  11,181  0 
10  8,760  didn't check 
11  6,296  didn't check 
12  4,082  didn't check 
13  2,493  didn't check 
14  1,326  didn't check 
15  720  didn't check 
Building the wordlist
I identified two wordlists which I might use: the scrabble dictionary, and googles billion word corpus. Unfortunately, the scrabble dictionary has so many weird words I've never heard of before, and google's dataset has a heap of misspellings and gibberish. To get my wordlist I found the intersection of the two, with the hope being that the result will be just reasonably common proper English words.
The code
I used pythonconstraint to solve the squares. A basic (very slow) implementation is shown below
from constraint import *
problem = Problem()
s = 6
problem.addVariables(range(s*s), list("abcdefghijklmnopqrstuvwxyz"))
# build the wordlist from the wordlist text file
wordlist = {}
f = open("fewer_words.txt")
for line in f:
w = line.split()[0].lower()
wordlist[w] = 1
# we need a function: true if word is in wordlist, false if not
# this is characterwise checking, so a sequence of chars is provided,
# we need to join them into a string before checking if they are in the list
def inlist(*word):
w = ''.join(word)
if w in wordlist: return True
return False
# add row and col constraints
for i in range(0,s): problem.addConstraint(inlist,range(s*i,s*i+s))
for i in range(0,s): problem.addConstraint(inlist,range(i,s*s,s))
# pretty print the output
a = problem.getSolution()
for i in range(s):
for j in range(s): print a[i*s+j],
print('')
It can be sped up a lot by adding way more constraints on what bigrams, trigrams etc are possible. I apologise for the scruffy code, I was more concerned with getting it to work efficiently than making it look nice
from constraint import *
from timeit import default_timer as timer
from itertools import combinations
problem = Problem()
s = 3
problem.addVariables(list(range(s*s)), list("abcdefghijklmnopqrstuvwxyz"))
wordlist = {}
f = open("fewer_words.txt")
for line in f:
w = line.split()[0].lower()
if len(w) != s: continue
wordlist[w] = 1
# build a list of monograms in each position
mono = []
for j in range(s): mono.append({})
for i in wordlist.keys():
for j in range(s):
mono[j][i[j]] = 1
# build a list of bigrams at each pair of positions
bgpos = list(combinations(list(range(s)),2))
bg = {}
for j in bgpos: bg[j] = {}
for i in wordlist.keys():
for j in bgpos:
bg[j][i[j[0]]+i[j[1]]] = 1
tripos = list(combinations(list(range(s)),3))
tri = {}
for j in tripos: tri[j] = {}
for i in wordlist.keys():
for j in tripos:
tri[j][i[j[0]]+i[j[1]]+i[j[2]]] = 1
qpos = list(combinations(list(range(s)),4))
q = {}
for j in qpos: q[j] = {}
for i in wordlist.keys():
for j in qpos:
q[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]] = 1
fpos = list(combinations(list(range(s)),5))
f = {}
for j in fpos: f[j] = {}
for i in wordlist.keys():
for j in fpos:
f[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]+i[j[4]]] = 1
def opposite():
return lambda x,y: x==y
# we need a function: true if word is in wordlist, false if not
def inlist(*word):
w = ''.join(word)
if w in wordlist: return True
return False
def inmono(i):
return lambda x: x in mono[i]
def inbg(i,j):
return lambda *x: ''.join(x) in bg[(i,j)]
def intri(i,j,k):
return lambda *x: ''.join(x) in tri[(i,j,k)]
def inq(i,j,k,l):
return lambda *x: ''.join(x) in q[(i,j,k,l)]
def inf(i,j,k,l,m):
return lambda *x: ''.join(x) in f[(i,j,k,l,m)]
# add row and col constraints
for i in range(0,s): problem.addConstraint(inlist,list(range(s*i,s*i+s)))
for i in range(0,s): problem.addConstraint(inlist,list(range(i,s*s,s)))
# add row monogram constraints
for j in range(s):
for i in range(s):
problem.addConstraint(inmono(i),[j*s +i])
problem.addConstraint(inmono(i),[i*s +j])
#print(bg.keys())
for j in range(s):
for a,b in bgpos:
problem.addConstraint(inbg(a,b),[j*s +a,j*s +b])
problem.addConstraint(inbg(a,b),[a*s +j,b*s +j])
for j in range(s):
for a,b,c in tripos:
problem.addConstraint(intri(a,b,c),[j*s +a,j*s +b,j*s +c])
problem.addConstraint(intri(a,b,c),[a*s +j,b*s +j,c*s +j])
for j in range(s):
for a,b,c,d in qpos:
problem.addConstraint(inq(a,b,c,d),[j*s +a,j*s +b,j*s +c,j*s +d])
problem.addConstraint(inq(a,b,c,d),[a*s +j,b*s +j,c*s +j,d*s +j])
for j in range(s):
for a,b,c,d,e in fpos:
problem.addConstraint(inf(a,b,c,d,e),[j*s +a,j*s +b,j*s +c,j*s +d,j*s +e])
problem.addConstraint(inf(a,b,c,d,e),[a*s +j,b*s +j,c*s +j,d*s +j,e*s +j])
# pretty print the output
start = timer()
for count,a in enumerate(problem.getSolutionIter()):
if count % 1000 == 0: print 'count:',count
for i in range(s):
for j in range(s): print a[i*s+j],
print('')
print('')
end = timer()
print("solution time = " + str(endstart))
print count
for i in range(s):
for j in range(s): print a[i*s+j],
print ''
print ''
This comment has been removed by a blog administrator.
ReplyDelete