This post will deal with the seemingly boring topic of magic word squares - I define these to be squares of letters where each row is a dictionary word, and each column is also a dictionary word. In my wordlist, there are 3,641 4-letter words (I talk about how I generated the wordlist below). How many possible 4 by 4 word squares do you think there are? Would you believe 90.6 million? I was rather surprised to discover there were so many. What about 8 by 8 word squares (given there are 12,317 8 letter words in the wordlist)? would you believe 0? It turns out it gets much harder to find word squares as you increase the side length.

I'll display some of the word squares and other stats before I display the python code I used to generate all these things further down.

# Some example word squares

a z o n o w t o n |
f u z z a l o e g a r s s n i t |
f r i z z l a r e e o d o r s p i n o t s i s s y |

z y g o t e y e l p e d g l u i n g o p i a t e t e n t e r e d g e r s |
e s c a p e s s e a l a n t c a s e t t e a l e r t e r p a t t e r n e n t e r a l s t e r n l y |

# How many word squares are there?

I counted how many word squares there are for each side length. I was suprised how many there are for 4x4 squares, but then there are no 8x8 squares or higher. The No. of words is how many words of that particular length occur in the wordlist.

Side Length | No. of words | No. of squares |

3 | 1,014 | 2,607,858 |

4 | 3,641 | 90,669,454 |

5 | 6,683 | 10,991,077 |

6 | 9,559 | 68,518 |

7 | 11,960 | 98 |

8 | 12,317 | 0 |

9 | 11,181 | 0 |

10 | 8,760 | didn't check |

11 | 6,296 | didn't check |

12 | 4,082 | didn't check |

13 | 2,493 | didn't check |

14 | 1,326 | didn't check |

15 | 720 | didn't check |

# Building the wordlist

I identified two wordlists which I might use: the scrabble dictionary, and googles billion word corpus. Unfortunately, the scrabble dictionary has so many weird words I've never heard of before, and google's dataset has a heap of misspellings and gibberish. To get my wordlist I found the intersection of the two, with the hope being that the result will be just reasonably common proper English words.

# The code

I used python-constraint to solve the squares. A basic (very slow) implementation is shown below

from constraint import * problem = Problem() s = 6 problem.addVariables(range(s*s), list("abcdefghijklmnopqrstuvwxyz")) # build the wordlist from the wordlist text file wordlist = {} f = open("fewer_words.txt") for line in f: w = line.split()[0].lower() wordlist[w] = 1 # we need a function: true if word is in wordlist, false if not # this is characterwise checking, so a sequence of chars is provided, # we need to join them into a string before checking if they are in the list def inlist(*word): w = ''.join(word) if w in wordlist: return True return False # add row and col constraints for i in range(0,s): problem.addConstraint(inlist,range(s*i,s*i+s)) for i in range(0,s): problem.addConstraint(inlist,range(i,s*s,s)) # pretty print the output a = problem.getSolution() for i in range(s): for j in range(s): print a[i*s+j], print('')

It can be sped up a lot by adding way more constraints on what bigrams, trigrams etc are possible. I apologise for the scruffy code, I was more concerned with getting it to work efficiently than making it look nice

from constraint import * from timeit import default_timer as timer from itertools import combinations problem = Problem() s = 3 problem.addVariables(list(range(s*s)), list("abcdefghijklmnopqrstuvwxyz")) wordlist = {} f = open("fewer_words.txt") for line in f: w = line.split()[0].lower() if len(w) != s: continue wordlist[w] = 1 # build a list of monograms in each position mono = [] for j in range(s): mono.append({}) for i in wordlist.keys(): for j in range(s): mono[j][i[j]] = 1 # build a list of bigrams at each pair of positions bgpos = list(combinations(list(range(s)),2)) bg = {} for j in bgpos: bg[j] = {} for i in wordlist.keys(): for j in bgpos: bg[j][i[j[0]]+i[j[1]]] = 1 tripos = list(combinations(list(range(s)),3)) tri = {} for j in tripos: tri[j] = {} for i in wordlist.keys(): for j in tripos: tri[j][i[j[0]]+i[j[1]]+i[j[2]]] = 1 qpos = list(combinations(list(range(s)),4)) q = {} for j in qpos: q[j] = {} for i in wordlist.keys(): for j in qpos: q[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]] = 1 fpos = list(combinations(list(range(s)),5)) f = {} for j in fpos: f[j] = {} for i in wordlist.keys(): for j in fpos: f[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]+i[j[4]]] = 1 def opposite(): return lambda x,y: x==y # we need a function: true if word is in wordlist, false if not def inlist(*word): w = ''.join(word) if w in wordlist: return True return False def inmono(i): return lambda x: x in mono[i] def inbg(i,j): return lambda *x: ''.join(x) in bg[(i,j)] def intri(i,j,k): return lambda *x: ''.join(x) in tri[(i,j,k)] def inq(i,j,k,l): return lambda *x: ''.join(x) in q[(i,j,k,l)] def inf(i,j,k,l,m): return lambda *x: ''.join(x) in f[(i,j,k,l,m)] # add row and col constraints for i in range(0,s): problem.addConstraint(inlist,list(range(s*i,s*i+s))) for i in range(0,s): problem.addConstraint(inlist,list(range(i,s*s,s))) # add row monogram constraints for j in range(s): for i in range(s): problem.addConstraint(inmono(i),[j*s +i]) problem.addConstraint(inmono(i),[i*s +j]) #print(bg.keys()) for j in range(s): for a,b in bgpos: problem.addConstraint(inbg(a,b),[j*s +a,j*s +b]) problem.addConstraint(inbg(a,b),[a*s +j,b*s +j]) for j in range(s): for a,b,c in tripos: problem.addConstraint(intri(a,b,c),[j*s +a,j*s +b,j*s +c]) problem.addConstraint(intri(a,b,c),[a*s +j,b*s +j,c*s +j]) for j in range(s): for a,b,c,d in qpos: problem.addConstraint(inq(a,b,c,d),[j*s +a,j*s +b,j*s +c,j*s +d]) problem.addConstraint(inq(a,b,c,d),[a*s +j,b*s +j,c*s +j,d*s +j]) for j in range(s): for a,b,c,d,e in fpos: problem.addConstraint(inf(a,b,c,d,e),[j*s +a,j*s +b,j*s +c,j*s +d,j*s +e]) problem.addConstraint(inf(a,b,c,d,e),[a*s +j,b*s +j,c*s +j,d*s +j,e*s +j]) # pretty print the output start = timer() for count,a in enumerate(problem.getSolutionIter()): if count % 1000 == 0: print 'count:',count for i in range(s): for j in range(s): print a[i*s+j], print('') print('') end = timer() print("solution time = " + str(end-start)) print count for i in range(s): for j in range(s): print a[i*s+j], print '' print ''

This comment has been removed by a blog administrator.

ReplyDelete