Pages

Tuesday, 31 January 2017

Generating Gray codes with python

Gray codes are often used in positioning systems, see the wiki article for more details. I want to generate gray codes with python, and I'd like to generate all possible gray codes for a specific number of positions and number of bits. The maximum number of positions for a gray code is 2^numbits i.e. a 4 bit gray code can have at most 16 positions. It can also have fewer. If you want the gray code to be circular i.e. the first and last code have hamming distance 1, then there needs to be an even number of positions. Odd is fine if you don't care if it is circular.

There are of course algorithms for generating gray codes, but a constraint solver works really well, and also allows you to generate all possible gray codes for a specific number of positions. The python code below is for generating gray codes using python-constraint:

from constraint import *
from itertools import product

numbits = 4
numpos = 16

# this generates a list of all binary vectors of numbits bits
vecs = list(product(*(((1,0),)*numbits)))

problem = Problem()
# we will have numpos variables, i.e. one var for each code,
# each var is an index into vecs, has 2^numbits possible vals
problem.addVariables(list(range(numpos)), list(range(len(vecs))))

# vecs holds our binary vectors. 
# we want True if hamming distance between vecs[a] and vecs[b] is 1
# False otherwise. This will be our constraint. 
def hammingDist1(a,b):
    hamming = sum([vecs[a][i]^vecs[b][i] for i in range(numbits)])
    if hamming == 1: return True
    return False

# add the important constraints
for i in range(numpos):
    j = (i+1)%numpos
    problem.addConstraint(hammingDist1,[i,j])

# make sure all the vectors are different    
problem.addConstraint(AllDifferentConstraint())

# print out the first solution we come across
sol = problem.getSolution()
for i,j in enumerate(sol):
    print(vecs[sol[i]])
    

The main thing to note is that we add constraints so that each code has a hamming distance of 1 with its neighbour, and also that all the codes are diferent.

An example run looks like this (for a 4-bit, 16 position code):

C:\Users\james\Desktop>python gray.py
(0, 0, 0, 0)
(0, 0, 0, 1)
(0, 0, 1, 1)
(0, 0, 1, 0)
(0, 1, 1, 0)
(0, 1, 1, 1)
(0, 1, 0, 1)
(1, 1, 0, 1)
(1, 0, 0, 1)
(1, 0, 0, 0)
(1, 0, 1, 0)
(1, 0, 1, 1)
(1, 1, 1, 1)
(1, 1, 1, 0)
(1, 1, 0, 0)
(0, 1, 0, 0)

Monday, 16 January 2017

Latex Dictionary Template

I recently needed a latex dictionary template, there are a couple around on the internet, but they are all under non-commercial licences so I couldn't use them. Fortunately it is pretty easy to make one yourself, I have provided one here. Consider it to be under the wtfpl licence.

A short example of what the definitions look like:

%Copyright 2017 James Lyons
%This work is free. You can redistribute it and/or modify it under the
%terms of the Do What The Fuck You Want To Public License, Version 2,
%as published by Sam Hocevar. See http://www.wtfpl.net/ for more details.
\documentclass[twoside]{book}
\usepackage[utf8]{inputenc}
\usepackage{multicol}
\usepackage{fancyhdr} 
\usepackage[bindingoffset=0.2in,margin=0.75in,paperwidth=5.25in,paperheight=8in]{geometry}

% set the headers so that first mark is always on left, second on right
% this is so that each word is put in the header correctly 
\pagestyle{fancy} 
\fancyhead[LO,LE]{\rightmark}
\fancyhead[RE,RO]{\leftmark}
% set the font
\renewcommand*\rmdefault{bch}
\newcommand{\comment}[1]{}
% remove chapter numbering
\usepackage{titlesec}
\titleformat{\chapter}
  {\Large\bfseries} % format
  {}                % label
  {0pt}             % sep
  {\huge}           % before-code
% remove section numbering
\setcounter{secnumdepth}{0}
% this is how we define the actual dictionary items. markboth is for 
% setting the page headers so that first and last words appear there.
\newcommand{\ditem}[1]{\item[#1] \markboth{\footnotesize \textbf{#1}}{\footnotesize \textbf{#1}}}
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%% title page
\begin{center}

\thispagestyle{empty}
\vbox{\vspace{5cm}
{\huge The Dictionary Title}}
\vfill
{\large Author}
\vfill

\end{center}
\clearpage
%%%%%%%%%%%%%%%%%%%%%%%% table of contents
\tableofcontents

%%%%%%%%%%%%%%%%%%%%%%%% Preface
\chapter{Preface}

This is the preface.

\chapter{The Dictionary}
% we'd like the text size to be a bit smaller for the definitions
\footnotesize
% we'd like two columns of definitions
\begin{multicols}{2}

\begin{description}
\section{A}
\ditem{Abandon} \textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abandoned} \textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abide} \textit{-v.} definition 1 \textit{-n.} definition 2
\end{description}

\end{multicols}
 
\end{document}

Sunday, 15 January 2017

Creating crosswords with python-constraint [part 2]

In the previous post I was generating the patterns of white and black squares. In this post I'd like to actually fill the patterns with words so that everything matches up. This sort of problem is very well suited to a constraint satisfaction solver, as it can try many possibilities until something is found that works. The trick is specifying the constraints in an efficient manner. My first solution was to implement it the same way as the magic word squares, but this was really inefficient and there is a much better way.

I put the code for this on github, it's a bit messy but works: https://github.com/jameslyons/crossword_gen.

The trick is you don't need to represent all the different squares in the crossword as separate variables and put the constraints on those, your actual variables will be indexes into the word list, and the constraints will be whether e.g. character 4 of word 1 is the same as character 2 of word 7. These constraints will obviously be different for every crossword and will have to be generated each time, but this isn't too hard.

The inputs from the previous step are hex strings e.g.

3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg

This can be converted to a binary string by replacing e.g. 3 with '0011', same as decoding hex anywhere. The function I use is down the bottom of the previous post. This gives a binary string that can be represented as the white and black squares of the crossword.

Example run

The code takes as argument the hex representation of the pattern. It first prints a representation of the crossword with 'X's where the letters go, then once it gets solved it outputs the solved crossword, along with the list of words that got used. The dictionary I am using has some weird words in it, so you need to pay some attention to that if you want nice crosswords (I recommend using something like the 3of6game.txt from the 12dicts wordlist, it is specifically for games). The final thing that gets printed is the crossword represented as a string, this is passed to some typesetting code I have that produces nice pdfs of the crosswords.

>pypy cw_fill.py BEFB4516BFAFAAF8FE27573BFBDD5DEFEE75723F8FAAFAFEB4516FBEh
building word lists...
X   X X X X X   X X X X X   X
X   X       X   X       X   X
X   X   X X X X X X X   X   X
X X X X   X   X   X   X X X X
X       X X X X X X X       X
    X X X   X   X   X X X
X X X   X X X X X X X   X X X
X   X X X   X   X   X X X   X
X X X   X X X X X X X   X X X
    X X X   X   X   X X X
X       X X X X X X X       X
X X X X   X   X   X   X X X X
X   X   X X X X X X X   X   X
X   X       X   X       X   X
X   X X X X X   X X X X X   X
starting...
count: 0
s . c r u m b . r e m a n . b
w . r . . . r . i . . . o . r
a . a . p h o e b u s . s . i
m o w n . e . l . r . g y m s
i . . . c r i m i n e . . . k
. . p o o . n . n . v e g . .
a v o . i n f a u n a . i l k
s . s u n . l . t . d i g . i
p h i . a l a n i n e . o f t
. . t a g . m . l . r a t . .
i . . . e y e l e s s . . . o
m o t s . e . o . p . w a t t
i . o . s t o p g a p . m . t
d . r . . . a . e . . . i . a
e . c l a c k . l u l l s . r

['swami', 'asp', 'imide', 'craw', 'posit', 'torc', 'coinage', 'her', 'yet', 'bro',
 'inflame', 'oak', 'elm', 'lop', 'rib', 'inutile', 'gel', 'urn', 'spa', 'evaders',
 'nosy', 'gigot', 'amis', 'brisk', 'kit', 'ottar', 'crumb', 'reman', 'phoebus',
 'mown', 'gyms', 'crimine', 'poo', 'veg', 'avo', 'infauna', 'ilk', 'sun', 'dig',
 'phi', 'alanine', 'oft', 'tag', 'rat', 'eyeless', 'mots', 'watt', 'stopgap',
 'clack', 'lulls']
solution time = 1.05939997405
s.crumb.reman.bw.r...r.i...o.ra.a.phoebus.s.imown.e.l.r.gymsi...crimine...k..p
oo.n.n.veg..avo.infauna.ilks.sun.l.t.dig.iphi.alanine.oft..tag.m.l.rat..i...ey
eless...omots.e.o.p.watti.o.stopgap.m.td.r...a.e...i.ae.clack.lulls.r

The code

This takes the hex string output from the pattern generation step as an argument. It uses a wordlist called 'fewerwords.txt'. Each time it is run it should give different results because the word indexes are shuffled each time. Sometimes it will take a long time to find a solution, I suggest just killing it and running it again. Most of the time this will get a new starting point that allows it to be solved very quickly, though some patterns are just harder to solve than others.

from __future__ import print_function
from constraint import *
from timeit import default_timer as timer
from itertools import combinations
from math import sqrt
from test_bin2hex import hex2bin
import sys

''' to run from pypy
pypy cw_fill.py 3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg
'''
# this program will take a crossword pattern and try to fill it with words

if len(sys.argv) < 2: print('supply pattern as argument')
else:
    pattern = hex2bin(sys.argv[1]) 
    pattern = [int(i) for i in pattern]

s = int(sqrt(len(pattern)))

print('building word lists...')
# extract the word positions from the pattern
wordlocs = []
hwordlocs = []
vwordlocs = []
for i in range(s):
    hword =  []
    vword = []
    for j in range(s):
        if pattern[i*s + j] == 0: print(' ',end=' ')
        else: print('X',end=' ')
        if pattern[i*s + j] == 1:
            hword.append(i*s + j)
        else:
            if len(hword) > 1: hwordlocs.append(hword)
            hword = []         
        
        if pattern[j*s + i] == 1:
            vword.append(j*s + i)
        else:
            if len(vword) > 1: vwordlocs.append(vword)
            vword = []
        
        if j == s-1:
            if len(vword) > 1 and vword not in vwordlocs: vwordlocs.append(vword)
            if len(hword) > 1 and hword not in hwordlocs: hwordlocs.append(hword)
            
    print('')
wordlocs.extend(vwordlocs)
wordlocs.extend(hwordlocs)

# now that we have the words, add constraints so each word is in the dictionary
maxL = 10

wordlist = []
f = open("fewwords.txt")
for line in f:
    w = line.split()[0].lower()
    wordlist.append(w)

wordlen = []
olen = 0
for c,word in enumerate(wordlist):
    if olen < len(word):
        wordlen.append(c)
    olen = len(word)

problem = Problem()
from random import shuffle

for c,w in enumerate(wordlocs):
    L = len(w)-3
    r = list(range(wordlen[L],wordlen[L+1]))
    shuffle(r)
    problem.addVariables([c], list(r))

problem.addConstraint(AllDifferentConstraint())

def eq(a,b):
    def temp(c1,c2):
        word1 = wordlist[c1]
        word2 = wordlist[c2]
        if word1[a]==word2[b]: return True
        return False
    return temp
        
for c1,word1 in enumerate(wordlocs):
    for c2,word2 in enumerate(wordlocs):
        if word1==word2: continue
        intsc = list(set(word1).intersection(set(word2)))
        if len(intsc) > 0:
            a = word1.index(intsc[0])
            b = word2.index(intsc[0])
            problem.addConstraint(eq(a,b),[c1,c2])    
        
print('starting...')
        
def getwords(solution):
    listq = []
    for locs in wordlocs:
        word = ''.join([solution[loc] for loc in locs])
        listq.append(word)        
    return listq
              
# pretty print the output
start = timer()
b = ''
for count,a in enumerate(problem.getSolutionIter()):  
    print('count: '+str(count))
    sol = ['.' for i in range(s*s)]
    for key,val in a.iteritems():
        for c,char in enumerate(wordlist[val]):
            sol[wordlocs[key][c]] = char      
    b=sol
    for i in range(s):
        for j in range(s): 
            print(sol[i*s+j],end=' ')
        print('')
    print('')  
    # get the words we have used
    zz = getwords(sol)
    print(zz)
    break
end = timer()
print("solution time = " + str(end-start))
print(''.join([str(b[i]) for i in range(s*s)]))

Creating Crosswords with python-constraint [part 1]

Intro

I put the code for this on github, it's a bit messy but works: https://github.com/jameslyons/crossword_gen.

In previous posts I have been playing with python-constraint, and building crosswords is something that interests me for some reason. I decided to do it in a two step process: step 1 generate the patterns of white and black squares, step 2 fill the patterns with letters. This post will deal with step 1 only, the next post will deal with step 2.

Generating the patterns is well suited to using a constraint solver since you can just specify the things you'd like to be true of the final result, then ask the constraint solver to do it for you. In this case we have e.g. 15 by 15 binary variables, we choose the constraints so that the final pattern looks like a crossword puzzle.

What do pretty patterns look like

I am interested in automatically generating patterns that look like nice crossword patterns e.g.

Some of the things I'd like to be true of the patterns I generate:

  1. I'd like 4-fold symmetry, because I think it looks good.
  2. No 2x2 blocks of white squares.
  3. No 2x2 blocks of black squares.
  4. No length 1 words.
  5. No length 2 words.
  6. No words longer than MAXLEN.
  7. No sequences of black squares longer than 3.
  8. White squares must be fully connected.

Some of these conditions are pretty easy to implement with python-constraint, the fully connected constraint I can just apply to the final solutions and discard them at that point. By fully connected I mean you can get from one square to every other square in the pattern, so you can't get disjoint islands.

The implementation

Examples of what the program outputs:

3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg
X X . . . . . X . . . . . X X
. . . X X X . X . X X X . . .
. X . X . . . . . . . X . X .
. . . . X . X . X . X . . . .
. X X X . . . . . . . X X X .
. X . . . X . X . X . . . X .
. . . X . . . . . . . X . . .
X X . . . X . X . X . . . X X
. . . X . . . . . . . X . . .
. X . . . X . X . X . . . X .
. X X X . . . . . . . X X X .
. . . . X . X . X . X . . . .
. X . X . . . . . . . X . X .
. . . X X X . X . X X X . . .
X X . . . . . X . . . . . X X

The hex string is the binary to hex encoding of the pattern, 'X's are zeros, '.'s are ones. The hex string will be passed to step 2 which will fill in the words using a wordlist.

It's pretty easy to generate a few million example patterns with this program (if you are willing to wait a day), which is what I suggest doing. Once you have a heap of patterns you can further filter them by whatever criteria you want e.g. no more than X 3-letter words, no more than X% black squares etc. Once you have the filtered list, you can try filling them with words.

from __future__ import print_function
from constraint import *
from itertools import combinations
from math import sqrt,floor

# this program will create crossword patterns, but not fill it with words
# i.e. just patterns of zeros and ones for the black and white squares

# tests if a solution is fully connected i.e. can every point be reached from
# any random point. True if fullyconnected, false if disjoint parts.
# Could easily be made more efficient, but this works fine.
def isfullyconnected(solution):
    L = int(sqrt(len(solution)))
    # get list of all non-zero points
    plist = []
    for i in range(L):
        for j in range(L):
            if solution[i*L + j]: plist.append((i,j))
    
    # start top right corner, try to get to every other point
    reached = {plist[0]:1}
    rlen = len(reached)
    oldrlen = 0
    while rlen != oldrlen:
      temp = list(reached.keys())
      for i,j in temp:
        if (i+1,j) in plist: reached[(i+1,j)]=1
        if (i-1,j) in plist: reached[(i-1,j)]=1
        if (i,j+1) in plist: reached[(i,j+1)]=1
        if (i,j-1) in plist: reached[(i,j-1)]=1
      oldrlen = rlen
      rlen = len(reached)
    # if rlen is not eq number of all points, we couldn't reach some points
    if rlen < len(plist): return False
    return True    
    
from random import randint
    
def getPattern(sidelen=15,maxlen=9):
    problem = Problem()
    s = sidelen
    maxWordLen = maxlen
    problem.addVariables(list(range(s*s)), [0,1])
  
    # ensure there are no 2x2 blocks of ones or zeros
    def no4x4block(*x):
        if sum(x)==0: return False
        if sum(x)==len(x): return False
        else: return True

    for i in range(s-1):
        for j in range(s-1):
            problem.addConstraint(no4x4block,[i*s+j,(i*s+j)+1,(i*s+j)+s,(i*s+j)+s+1])
    
    #ensure matrix is symmetric  
    def eq(a,b):  
        return a==b

    for i in range(s):
        for j in range(s):
            problem.addConstraint(eq,[i*s+j,(s-i-1)*s+j])
            problem.addConstraint(eq,[i*s+j,i*s+(s-j-1)])
            problem.addConstraint(eq,[i*s+j,(s-i-1)*s+(s-j-1)])
       
    #ensure no words longer than maxWordLen
    def maxlen(*x):
        if sum(x) == len(x): return False
        else: return True

    seq = list(range(maxWordLen+1))
    for i in range(s - maxWordLen):
        for j in range(s):
            problem.addConstraint(maxlen,[(k+i)+j*s for k in seq])
            problem.addConstraint(maxlen,[(k+i)*s+j for k in seq])
    
    # no two letter words
    def no2letter(*x):
        if x == (0,1,1,0): return False
        else: return True
    
    seq = list(range(4))
    for i in range(s - 3):
        for j in range(s):
            problem.addConstraint(no2letter,[(k+i)+j*s for k in seq])
            problem.addConstraint(no2letter,[(k+i)*s+j for k in seq])

    # no two letter words around the edge either
    def no2letterEdge(*x):
        if x == (1,1,0): return False
        else: return True
    
    seq = list(range(4))
    for i in range(s):
        problem.addConstraint(no2letterEdge,[i,i+s,i+2*s])
        problem.addConstraint(no2letterEdge,[i*s,i*s+1,i*s+2])
        problem.addConstraint(no2letterEdge,[i*s+s-1,i*s+s-2,i*s+s-3])
        problem.addConstraint(no2letterEdge,[i+s*(s-1),i+s*(s-2),i+s*(s-3)])
    
    # no one letter squares
    def no1letter(*x):    
        if sum(x) == 0: return True
        if sum(x[1:]) == 0: return False
        return True
    
    seq = list(range(4))
    for i in range(s):
      for j in range(s):
        vals = [i*s+j]
        if i-1 >= 0: vals.append((i-1)*s+j) 
        if i+1 < s: vals.append((i+1)*s+j) 
        if j-1 >= 0: vals.append(i*s+(j-1)) 
        if j+1 < s: vals.append(i*s+(j+1)) 
        problem.addConstraint(no1letter,vals)
 
    # no more than three 0's in a row
    def nothreeblank(*x):
        if sum(x) == 0: return False
        return True
    
    seq = list(range(4))
    for i in range(s - 3):
        for j in range(s):
            problem.addConstraint(nothreeblank,[(k+i)+j*s for k in seq])
            problem.addConstraint(nothreeblank,[(k+i)*s+j for k in seq])
   
    for sol in problem.getSolutionIter():  
        if not isfullyconnected(sol): continue
        yield(sol)
    
from test_bin2hex import bin2hex
import sys

print(len(sys.argv))
if len(sys.argv)<2: sidelen = 11
else: sidelen = int(sys.argv[1])
print('using sidelen',sidelen)

f = open('pattern'+str(sidelen)+'.txt','w')
for count,sol in enumerate(getPattern(sidelen=sidelen,maxlen=8)):
    f.write(bin2hex(''.join([str(sol[a]) for a in range(sidelen*sidelen)]))+'\n')
    if count % 100 == 0: 
      print('count: '+str(count))
      print(bin2hex(''.join([str(sol[a]) for a in range(sidelen*sidelen)])))
      for i in range(sidelen):
        for j in range(sidelen): 
            if sol[i*sidelen+j]==1: print('.',end=' ')
            else: print('X',end=' ')
        print('')
      print('')
    #break

The hex string is generated using the following python script (test_bin2hex.py). The extra letters are so we can encode binary strings that aren't necessarily divisible by 4.


bin = ['0000','0001','0010','0011','0100','0101','0110','0111','1000','1001','1010','1011','1100','1101','1110','1111',
       '0','1','00','01','10','11','000','001','010','011','100','101','110','111']
hex = "0123456789ABCDEFghijklmnopqrst"

def bin2hex(binstr):
    map = dict(zip(bin,hex))
    ret = ''
    for i in range(0,len(binstr),4):
        sub = binstr[i:i+4]
        ret += map[sub]
    return ret
    
def hex2bin(hexstr):
    map = dict(zip(hex,bin))
    ret = ''
    for i in hexstr:
        ret += map[i]
    return ret

Monday, 2 January 2017

Magic Word Squares with python-constraint

This post will deal with the seemingly boring topic of magic word squares - I define these to be squares of letters where each row is a dictionary word, and each column is also a dictionary word. In my wordlist, there are 3,641 4-letter words (I talk about how I generated the wordlist below). How many possible 4 by 4 word squares do you think there are? Would you believe 90.6 million? I was rather surprised to discover there were so many. What about 8 by 8 word squares (given there are 12,317 8 letter words in the wordlist)? would you believe 0? It turns out it gets much harder to find word squares as you increase the side length.

I'll display some of the word squares and other stats before I display the python code I used to generate all these things further down.

Some example word squares

a z o
n o w
t o n
f u z z
a l o e
g a r s    
s n i t
f r i z z
l a r e e
o d o r s
p i n o t
s i s s y
z y g o t e
y e l p e d
g l u i n g
o p i a t e
t e n t e r
e d g e r s
e s c a p e s
s e a l a n t
c a s e t t e
a l e r t e r
p a t t e r n
e n t e r a l
s t e r n l y

How many word squares are there?

I counted how many word squares there are for each side length. I was suprised how many there are for 4x4 squares, but then there are no 8x8 squares or higher. The No. of words is how many words of that particular length occur in the wordlist.

Side LengthNo. of wordsNo. of squares
31,0142,607,858
43,64190,669,454
56,68310,991,077
69,55968,518
711,96098
812,3170
911,1810
108,760didn't check
116,296didn't check
124,082didn't check
132,493didn't check
141,326didn't check
15720didn't check

Building the wordlist

I identified two wordlists which I might use: the scrabble dictionary, and googles billion word corpus. Unfortunately, the scrabble dictionary has so many weird words I've never heard of before, and google's dataset has a heap of misspellings and gibberish. To get my wordlist I found the intersection of the two, with the hope being that the result will be just reasonably common proper English words.

The code

I used python-constraint to solve the squares. A basic (very slow) implementation is shown below

from constraint import *
problem = Problem()
s = 6
problem.addVariables(range(s*s), list("abcdefghijklmnopqrstuvwxyz"))

# build the wordlist from the wordlist text file
wordlist = {}
f = open("fewer_words.txt")
for line in f:
    w = line.split()[0].lower()
    wordlist[w] = 1

# we need a function: true if word is in wordlist, false if not
# this is characterwise checking, so a sequence of chars is provided, 
# we need to join them into a string before checking if they are in the list
def inlist(*word): 
    w = ''.join(word)
    if w in wordlist: return True
    return False
   
# add row and col constraints
for i in range(0,s): problem.addConstraint(inlist,range(s*i,s*i+s))
for i in range(0,s): problem.addConstraint(inlist,range(i,s*s,s))

# pretty print the output
a = problem.getSolution()
for i in range(s):
    for j in range(s): print a[i*s+j],
    print('')

It can be sped up a lot by adding way more constraints on what bigrams, trigrams etc are possible. I apologise for the scruffy code, I was more concerned with getting it to work efficiently than making it look nice

from constraint import *
from timeit import default_timer as timer
from itertools import combinations

problem = Problem()
s = 3
problem.addVariables(list(range(s*s)), list("abcdefghijklmnopqrstuvwxyz"))

wordlist = {}
f = open("fewer_words.txt")
for line in f:
    w = line.split()[0].lower()
    if len(w) != s: continue 
    wordlist[w] = 1

# build a list of monograms in each position
mono = []
for j in range(s): mono.append({})
for i in wordlist.keys():
    for j in range(s):
        mono[j][i[j]] = 1
    
# build a list of bigrams at each pair of positions
bgpos = list(combinations(list(range(s)),2))
bg = {}
for j in bgpos: bg[j] = {}
for i in wordlist.keys():
    for j in bgpos:
        bg[j][i[j[0]]+i[j[1]]] = 1
        
tripos = list(combinations(list(range(s)),3))
tri = {}
for j in tripos: tri[j] = {}
for i in wordlist.keys():
    for j in tripos:
        tri[j][i[j[0]]+i[j[1]]+i[j[2]]] = 1

qpos = list(combinations(list(range(s)),4))
q = {}
for j in qpos: q[j] = {}
for i in wordlist.keys():
    for j in qpos:
        q[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]] = 1

fpos = list(combinations(list(range(s)),5))
f = {}
for j in fpos: f[j] = {}
for i in wordlist.keys():
    for j in fpos:
        f[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]+i[j[4]]] = 1

def opposite():
    return lambda x,y: x==y
    
# we need a function: true if word is in wordlist, false if not
def inlist(*word): 
    w = ''.join(word)
    if w in wordlist: return True
    return False
   
def inmono(i):
    return lambda x: x in mono[i]
    
def inbg(i,j):
    return lambda *x: ''.join(x) in bg[(i,j)]   
   
def intri(i,j,k):
    return lambda *x: ''.join(x) in tri[(i,j,k)]   

def inq(i,j,k,l):
    return lambda *x: ''.join(x) in q[(i,j,k,l)]   

def inf(i,j,k,l,m):
    return lambda *x: ''.join(x) in f[(i,j,k,l,m)]   
    
# add row and col constraints
for i in range(0,s): problem.addConstraint(inlist,list(range(s*i,s*i+s)))
for i in range(0,s): problem.addConstraint(inlist,list(range(i,s*s,s)))

# add row monogram constraints
for j in range(s):
    for i in range(s):
        problem.addConstraint(inmono(i),[j*s +i])
        problem.addConstraint(inmono(i),[i*s +j])

#print(bg.keys())
for j in range(s):
    for a,b in bgpos:
        problem.addConstraint(inbg(a,b),[j*s +a,j*s +b])
        problem.addConstraint(inbg(a,b),[a*s +j,b*s +j])

for j in range(s):
    for a,b,c in tripos:
        problem.addConstraint(intri(a,b,c),[j*s +a,j*s +b,j*s +c])
        problem.addConstraint(intri(a,b,c),[a*s +j,b*s +j,c*s +j])

for j in range(s):
    for a,b,c,d in qpos:
        problem.addConstraint(inq(a,b,c,d),[j*s +a,j*s +b,j*s +c,j*s +d])
        problem.addConstraint(inq(a,b,c,d),[a*s +j,b*s +j,c*s +j,d*s +j])

for j in range(s):
    for a,b,c,d,e in fpos:
        problem.addConstraint(inf(a,b,c,d,e),[j*s +a,j*s +b,j*s +c,j*s +d,j*s +e])
        problem.addConstraint(inf(a,b,c,d,e),[a*s +j,b*s +j,c*s +j,d*s +j,e*s +j])

# pretty print the output
start = timer()
for count,a in enumerate(problem.getSolutionIter()):
  if count % 1000 == 0: print 'count:',count   
  for i in range(s):
    for j in range(s): print a[i*s+j],
    print('')
  print('')  
end = timer()

print("solution time = " + str(end-start))
print count

for i in range(s):
    for j in range(s): print a[i*s+j],
    print ''
print ''