Pages

Tuesday, 31 January 2017

Generating Gray codes with python

Gray codes are often used in positioning systems, see the wiki article for more details. I want to generate gray codes with python, and I'd like to generate all possible gray codes for a specific number of positions and number of bits. The maximum number of positions for a gray code is 2^numbits i.e. a 4 bit gray code can have at most 16 positions. It can also have fewer. If you want the gray code to be circular i.e. the first and last code have hamming distance 1, then there needs to be an even number of positions. Odd is fine if you don't care if it is circular.

There are of course algorithms for generating gray codes, but a constraint solver works really well, and also allows you to generate all possible gray codes for a specific number of positions. The python code below is for generating gray codes using python-constraint:

from constraint import *
from itertools import product

numbits = 4
numpos = 16

# this generates a list of all binary vectors of numbits bits
vecs = list(product(*(((1,0),)*numbits)))

problem = Problem()
# we will have numpos variables, i.e. one var for each code,
# each var is an index into vecs, has 2^numbits possible vals
problem.addVariables(list(range(numpos)), list(range(len(vecs))))

# vecs holds our binary vectors. 
# we want True if hamming distance between vecs[a] and vecs[b] is 1
# False otherwise. This will be our constraint. 
def hammingDist1(a,b):
    hamming = sum([vecs[a][i]^vecs[b][i] for i in range(numbits)])
    if hamming == 1: return True
    return False

# add the important constraints
for i in range(numpos):
    j = (i+1)%numpos
    problem.addConstraint(hammingDist1,[i,j])

# make sure all the vectors are different    
problem.addConstraint(AllDifferentConstraint())

# print out the first solution we come across
sol = problem.getSolution()
for i,j in enumerate(sol):
    print(vecs[sol[i]])
    

The main thing to note is that we add constraints so that each code has a hamming distance of 1 with its neighbour, and also that all the codes are diferent.

An example run looks like this (for a 4-bit, 16 position code):

C:\Users\james\Desktop>python gray.py
(0, 0, 0, 0)
(0, 0, 0, 1)
(0, 0, 1, 1)
(0, 0, 1, 0)
(0, 1, 1, 0)
(0, 1, 1, 1)
(0, 1, 0, 1)
(1, 1, 0, 1)
(1, 0, 0, 1)
(1, 0, 0, 0)
(1, 0, 1, 0)
(1, 0, 1, 1)
(1, 1, 1, 1)
(1, 1, 1, 0)
(1, 1, 0, 0)
(0, 1, 0, 0)

Monday, 16 January 2017

Latex Dictionary Template

I recently needed a latex dictionary template, there are a couple around on the internet, but they are all under non-commercial licences so I couldn't use them. Fortunately it is pretty easy to make one yourself, I have provided one here. Consider it to be under the wtfpl licence.

A short example of what the definitions look like:

%Copyright 2017 James Lyons
%This work is free. You can redistribute it and/or modify it under the
%terms of the Do What The Fuck You Want To Public License, Version 2,
%as published by Sam Hocevar. See http://www.wtfpl.net/ for more details.
\documentclass[twoside]{book}
\usepackage[utf8]{inputenc}
\usepackage{multicol}
\usepackage{fancyhdr} 
\usepackage[bindingoffset=0.2in,margin=0.75in,paperwidth=5.25in,paperheight=8in]{geometry}

% set the headers so that first mark is always on left, second on right
% this is so that each word is put in the header correctly 
\pagestyle{fancy} 
\fancyhead[LO,LE]{\rightmark}
\fancyhead[RE,RO]{\leftmark}
% set the font
\renewcommand*\rmdefault{bch}
\newcommand{\comment}[1]{}
% remove chapter numbering
\usepackage{titlesec}
\titleformat{\chapter}
  {\Large\bfseries} % format
  {}                % label
  {0pt}             % sep
  {\huge}           % before-code
% remove section numbering
\setcounter{secnumdepth}{0}
% this is how we define the actual dictionary items. markboth is for 
% setting the page headers so that first and last words appear there.
\newcommand{\ditem}[1]{\item[#1] \markboth{\footnotesize \textbf{#1}}{\footnotesize \textbf{#1}}}
\begin{document}
%%%%%%%%%%%%%%%%%%%%%%%% title page
\begin{center}

\thispagestyle{empty}
\vbox{\vspace{5cm}
{\huge The Dictionary Title}}
\vfill
{\large Author}
\vfill

\end{center}
\clearpage
%%%%%%%%%%%%%%%%%%%%%%%% table of contents
\tableofcontents

%%%%%%%%%%%%%%%%%%%%%%%% Preface
\chapter{Preface}

This is the preface.

\chapter{The Dictionary}
% we'd like the text size to be a bit smaller for the definitions
\footnotesize
% we'd like two columns of definitions
\begin{multicols}{2}

\begin{description}
\section{A}
\ditem{Abandon} \textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abandoned} \textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abbey}\textit{-v.} definition 1 \textit{-n.} definition 2
\ditem{Abide} \textit{-v.} definition 1 \textit{-n.} definition 2
\end{description}

\end{multicols}
 
\end{document}

Sunday, 15 January 2017

Creating crosswords with python-constraint [part 2]

In the previous post I was generating the patterns of white and black squares. In this post I'd like to actually fill the patterns with words so that everything matches up. This sort of problem is very well suited to a constraint satisfaction solver, as it can try many possibilities until something is found that works. The trick is specifying the constraints in an efficient manner. My first solution was to implement it the same way as the magic word squares, but this was really inefficient and there is a much better way.

I put the code for this on github, it's a bit messy but works: https://github.com/jameslyons/crossword_gen.

The trick is you don't need to represent all the different squares in the crossword as separate variables and put the constraints on those, your actual variables will be indexes into the word list, and the constraints will be whether e.g. character 4 of word 1 is the same as character 2 of word 7. These constraints will obviously be different for every crossword and will have to be generated each time, but this isn't too hard.

The inputs from the previous step are hex strings e.g.

3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg

This can be converted to a binary string by replacing e.g. 3 with '0011', same as decoding hex anywhere. The function I use is down the bottom of the previous post. This gives a binary string that can be represented as the white and black squares of the crossword.

Example run

The code takes as argument the hex representation of the pattern. It first prints a representation of the crossword with 'X's where the letters go, then once it gets solved it outputs the solved crossword, along with the list of words that got used. The dictionary I am using has some weird words in it, so you need to pay some attention to that if you want nice crosswords (I recommend using something like the 3of6game.txt from the 12dicts wordlist, it is specifically for games). The final thing that gets printed is the crossword represented as a string, this is passed to some typesetting code I have that produces nice pdfs of the crosswords.

>pypy cw_fill.py BEFB4516BFAFAAF8FE27573BFBDD5DEFEE75723F8FAAFAFEB4516FBEh
building word lists...
X   X X X X X   X X X X X   X
X   X       X   X       X   X
X   X   X X X X X X X   X   X
X X X X   X   X   X   X X X X
X       X X X X X X X       X
    X X X   X   X   X X X
X X X   X X X X X X X   X X X
X   X X X   X   X   X X X   X
X X X   X X X X X X X   X X X
    X X X   X   X   X X X
X       X X X X X X X       X
X X X X   X   X   X   X X X X
X   X   X X X X X X X   X   X
X   X       X   X       X   X
X   X X X X X   X X X X X   X
starting...
count: 0
s . c r u m b . r e m a n . b
w . r . . . r . i . . . o . r
a . a . p h o e b u s . s . i
m o w n . e . l . r . g y m s
i . . . c r i m i n e . . . k
. . p o o . n . n . v e g . .
a v o . i n f a u n a . i l k
s . s u n . l . t . d i g . i
p h i . a l a n i n e . o f t
. . t a g . m . l . r a t . .
i . . . e y e l e s s . . . o
m o t s . e . o . p . w a t t
i . o . s t o p g a p . m . t
d . r . . . a . e . . . i . a
e . c l a c k . l u l l s . r

['swami', 'asp', 'imide', 'craw', 'posit', 'torc', 'coinage', 'her', 'yet', 'bro',
 'inflame', 'oak', 'elm', 'lop', 'rib', 'inutile', 'gel', 'urn', 'spa', 'evaders',
 'nosy', 'gigot', 'amis', 'brisk', 'kit', 'ottar', 'crumb', 'reman', 'phoebus',
 'mown', 'gyms', 'crimine', 'poo', 'veg', 'avo', 'infauna', 'ilk', 'sun', 'dig',
 'phi', 'alanine', 'oft', 'tag', 'rat', 'eyeless', 'mots', 'watt', 'stopgap',
 'clack', 'lulls']
solution time = 1.05939997405
s.crumb.reman.bw.r...r.i...o.ra.a.phoebus.s.imown.e.l.r.gymsi...crimine...k..p
oo.n.n.veg..avo.infauna.ilks.sun.l.t.dig.iphi.alanine.oft..tag.m.l.rat..i...ey
eless...omots.e.o.p.watti.o.stopgap.m.td.r...a.e...i.ae.clack.lulls.r

The code

This takes the hex string output from the pattern generation step as an argument. It uses a wordlist called 'fewerwords.txt'. Each time it is run it should give different results because the word indexes are shuffled each time. Sometimes it will take a long time to find a solution, I suggest just killing it and running it again. Most of the time this will get a new starting point that allows it to be solved very quickly, though some patterns are just harder to solve than others.

from __future__ import print_function
from constraint import *
from timeit import default_timer as timer
from itertools import combinations
from math import sqrt
from test_bin2hex import hex2bin
import sys

''' to run from pypy
pypy cw_fill.py 3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg
'''
# this program will take a crossword pattern and try to fill it with words

if len(sys.argv) < 2: print('supply pattern as argument')
else:
    pattern = hex2bin(sys.argv[1]) 
    pattern = [int(i) for i in pattern]

s = int(sqrt(len(pattern)))

print('building word lists...')
# extract the word positions from the pattern
wordlocs = []
hwordlocs = []
vwordlocs = []
for i in range(s):
    hword =  []
    vword = []
    for j in range(s):
        if pattern[i*s + j] == 0: print(' ',end=' ')
        else: print('X',end=' ')
        if pattern[i*s + j] == 1:
            hword.append(i*s + j)
        else:
            if len(hword) > 1: hwordlocs.append(hword)
            hword = []         
        
        if pattern[j*s + i] == 1:
            vword.append(j*s + i)
        else:
            if len(vword) > 1: vwordlocs.append(vword)
            vword = []
        
        if j == s-1:
            if len(vword) > 1 and vword not in vwordlocs: vwordlocs.append(vword)
            if len(hword) > 1 and hword not in hwordlocs: hwordlocs.append(hword)
            
    print('')
wordlocs.extend(vwordlocs)
wordlocs.extend(hwordlocs)

# now that we have the words, add constraints so each word is in the dictionary
maxL = 10

wordlist = []
f = open("fewwords.txt")
for line in f:
    w = line.split()[0].lower()
    wordlist.append(w)

wordlen = []
olen = 0
for c,word in enumerate(wordlist):
    if olen < len(word):
        wordlen.append(c)
    olen = len(word)

problem = Problem()
from random import shuffle

for c,w in enumerate(wordlocs):
    L = len(w)-3
    r = list(range(wordlen[L],wordlen[L+1]))
    shuffle(r)
    problem.addVariables([c], list(r))

problem.addConstraint(AllDifferentConstraint())

def eq(a,b):
    def temp(c1,c2):
        word1 = wordlist[c1]
        word2 = wordlist[c2]
        if word1[a]==word2[b]: return True
        return False
    return temp
        
for c1,word1 in enumerate(wordlocs):
    for c2,word2 in enumerate(wordlocs):
        if word1==word2: continue
        intsc = list(set(word1).intersection(set(word2)))
        if len(intsc) > 0:
            a = word1.index(intsc[0])
            b = word2.index(intsc[0])
            problem.addConstraint(eq(a,b),[c1,c2])    
        
print('starting...')
        
def getwords(solution):
    listq = []
    for locs in wordlocs:
        word = ''.join([solution[loc] for loc in locs])
        listq.append(word)        
    return listq
              
# pretty print the output
start = timer()
b = ''
for count,a in enumerate(problem.getSolutionIter()):  
    print('count: '+str(count))
    sol = ['.' for i in range(s*s)]
    for key,val in a.iteritems():
        for c,char in enumerate(wordlist[val]):
            sol[wordlocs[key][c]] = char      
    b=sol
    for i in range(s):
        for j in range(s): 
            print(sol[i*s+j],end=' ')
        print('')
    print('')  
    # get the words we have used
    zz = getwords(sol)
    print(zz)
    break
end = timer()
print("solution time = " + str(end-start))
print(''.join([str(b[i]) for i in range(s*s)]))

Creating Crosswords with python-constraint [part 1]

Intro

I put the code for this on github, it's a bit messy but works: https://github.com/jameslyons/crossword_gen.

In previous posts I have been playing with python-constraint, and building crosswords is something that interests me for some reason. I decided to do it in a two step process: step 1 generate the patterns of white and black squares, step 2 fill the patterns with letters. This post will deal with step 1 only, the next post will deal with step 2.

Generating the patterns is well suited to using a constraint solver since you can just specify the things you'd like to be true of the final result, then ask the constraint solver to do it for you. In this case we have e.g. 15 by 15 binary variables, we choose the constraints so that the final pattern looks like a crossword puzzle.

What do pretty patterns look like

I am interested in automatically generating patterns that look like nice crossword patterns e.g.

Some of the things I'd like to be true of the patterns I generate:

  1. I'd like 4-fold symmetry, because I think it looks good.
  2. No 2x2 blocks of white squares.
  3. No 2x2 blocks of black squares.
  4. No length 1 words.
  5. No length 2 words.
  6. No words longer than MAXLEN.
  7. No sequences of black squares longer than 3.
  8. White squares must be fully connected.

Some of these conditions are pretty easy to implement with python-constraint, the fully connected constraint I can just apply to the final solutions and discard them at that point. By fully connected I mean you can get from one square to every other square in the pattern, so you can't get disjoint islands.

The implementation

Examples of what the program outputs:

3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg
X X . . . . . X . . . . . X X
. . . X X X . X . X X X . . .
. X . X . . . . . . . X . X .
. . . . X . X . X . X . . . .
. X X X . . . . . . . X X X .
. X . . . X . X . X . . . X .
. . . X . . . . . . . X . . .
X X . . . X . X . X . . . X X
. . . X . . . . . . . X . . .
. X . . . X . X . X . . . X .
. X X X . . . . . . . X X X .
. . . . X . X . X . X . . . .
. X . X . . . . . . . X . X .
. . . X X X . X . X X X . . .
X X . . . . . X . . . . . X X

The hex string is the binary to hex encoding of the pattern, 'X's are zeros, '.'s are ones. The hex string will be passed to step 2 which will fill in the words using a wordlist.

It's pretty easy to generate a few million example patterns with this program (if you are willing to wait a day), which is what I suggest doing. Once you have a heap of patterns you can further filter them by whatever criteria you want e.g. no more than X 3-letter words, no more than X% black squares etc. Once you have the filtered list, you can try filling them with words.

from __future__ import print_function
from constraint import *
from itertools import combinations
from math import sqrt,floor

# this program will create crossword patterns, but not fill it with words
# i.e. just patterns of zeros and ones for the black and white squares

# tests if a solution is fully connected i.e. can every point be reached from
# any random point. True if fullyconnected, false if disjoint parts.
# Could easily be made more efficient, but this works fine.
def isfullyconnected(solution):
    L = int(sqrt(len(solution)))
    # get list of all non-zero points
    plist = []
    for i in range(L):
        for j in range(L):
            if solution[i*L + j]: plist.append((i,j))
    
    # start top right corner, try to get to every other point
    reached = {plist[0]:1}
    rlen = len(reached)
    oldrlen = 0
    while rlen != oldrlen:
      temp = list(reached.keys())
      for i,j in temp:
        if (i+1,j) in plist: reached[(i+1,j)]=1
        if (i-1,j) in plist: reached[(i-1,j)]=1
        if (i,j+1) in plist: reached[(i,j+1)]=1
        if (i,j-1) in plist: reached[(i,j-1)]=1
      oldrlen = rlen
      rlen = len(reached)
    # if rlen is not eq number of all points, we couldn't reach some points
    if rlen < len(plist): return False
    return True    
    
from random import randint
    
def getPattern(sidelen=15,maxlen=9):
    problem = Problem()
    s = sidelen
    maxWordLen = maxlen
    problem.addVariables(list(range(s*s)), [0,1])
  
    # ensure there are no 2x2 blocks of ones or zeros
    def no4x4block(*x):
        if sum(x)==0: return False
        if sum(x)==len(x): return False
        else: return True

    for i in range(s-1):
        for j in range(s-1):
            problem.addConstraint(no4x4block,[i*s+j,(i*s+j)+1,(i*s+j)+s,(i*s+j)+s+1])
    
    #ensure matrix is symmetric  
    def eq(a,b):  
        return a==b

    for i in range(s):
        for j in range(s):
            problem.addConstraint(eq,[i*s+j,(s-i-1)*s+j])
            problem.addConstraint(eq,[i*s+j,i*s+(s-j-1)])
            problem.addConstraint(eq,[i*s+j,(s-i-1)*s+(s-j-1)])
       
    #ensure no words longer than maxWordLen
    def maxlen(*x):
        if sum(x) == len(x): return False
        else: return True

    seq = list(range(maxWordLen+1))
    for i in range(s - maxWordLen):
        for j in range(s):
            problem.addConstraint(maxlen,[(k+i)+j*s for k in seq])
            problem.addConstraint(maxlen,[(k+i)*s+j for k in seq])
    
    # no two letter words
    def no2letter(*x):
        if x == (0,1,1,0): return False
        else: return True
    
    seq = list(range(4))
    for i in range(s - 3):
        for j in range(s):
            problem.addConstraint(no2letter,[(k+i)+j*s for k in seq])
            problem.addConstraint(no2letter,[(k+i)*s+j for k in seq])

    # no two letter words around the edge either
    def no2letterEdge(*x):
        if x == (1,1,0): return False
        else: return True
    
    seq = list(range(4))
    for i in range(s):
        problem.addConstraint(no2letterEdge,[i,i+s,i+2*s])
        problem.addConstraint(no2letterEdge,[i*s,i*s+1,i*s+2])
        problem.addConstraint(no2letterEdge,[i*s+s-1,i*s+s-2,i*s+s-3])
        problem.addConstraint(no2letterEdge,[i+s*(s-1),i+s*(s-2),i+s*(s-3)])
    
    # no one letter squares
    def no1letter(*x):    
        if sum(x) == 0: return True
        if sum(x[1:]) == 0: return False
        return True
    
    seq = list(range(4))
    for i in range(s):
      for j in range(s):
        vals = [i*s+j]
        if i-1 >= 0: vals.append((i-1)*s+j) 
        if i+1 < s: vals.append((i+1)*s+j) 
        if j-1 >= 0: vals.append(i*s+(j-1)) 
        if j+1 < s: vals.append(i*s+(j+1)) 
        problem.addConstraint(no1letter,vals)
 
    # no more than three 0's in a row
    def nothreeblank(*x):
        if sum(x) == 0: return False
        return True
    
    seq = list(range(4))
    for i in range(s - 3):
        for j in range(s):
            problem.addConstraint(nothreeblank,[(k+i)+j*s for k in seq])
            problem.addConstraint(nothreeblank,[(k+i)*s+j for k in seq])
   
    for sol in problem.getSolutionIter():  
        if not isfullyconnected(sol): continue
        yield(sol)
    
from test_bin2hex import bin2hex
import sys

print(len(sys.argv))
if len(sys.argv)<2: sidelen = 11
else: sidelen = int(sys.argv[1])
print('using sidelen',sidelen)

f = open('pattern'+str(sidelen)+'.txt','w')
for count,sol in enumerate(getPattern(sidelen=sidelen,maxlen=8)):
    f.write(bin2hex(''.join([str(sol[a]) for a in range(sidelen*sidelen)]))+'\n')
    if count % 100 == 0: 
      print('count: '+str(count))
      print(bin2hex(''.join([str(sol[a]) for a in range(sidelen*sidelen)])))
      for i in range(sidelen):
        for j in range(sidelen): 
            if sol[i*sidelen+j]==1: print('.',end=' ')
            else: print('X',end=' ')
        print('')
      print('')
    #break

The hex string is generated using the following python script (test_bin2hex.py). The extra letters are so we can encode binary strings that aren't necessarily divisible by 4.


bin = ['0000','0001','0010','0011','0100','0101','0110','0111','1000','1001','1010','1011','1100','1101','1110','1111',
       '0','1','00','01','10','11','000','001','010','011','100','101','110','111']
hex = "0123456789ABCDEFghijklmnopqrst"

def bin2hex(binstr):
    map = dict(zip(bin,hex))
    ret = ''
    for i in range(0,len(binstr),4):
        sub = binstr[i:i+4]
        ret += map[sub]
    return ret
    
def hex2bin(hexstr):
    map = dict(zip(hex,bin))
    ret = ''
    for i in hexstr:
        ret += map[i]
    return ret

Monday, 2 January 2017

Magic Word Squares with python-constraint

This post will deal with the seemingly boring topic of magic word squares - I define these to be squares of letters where each row is a dictionary word, and each column is also a dictionary word. In my wordlist, there are 3,641 4-letter words (I talk about how I generated the wordlist below). How many possible 4 by 4 word squares do you think there are? Would you believe 90.6 million? I was rather surprised to discover there were so many. What about 8 by 8 word squares (given there are 12,317 8 letter words in the wordlist)? would you believe 0? It turns out it gets much harder to find word squares as you increase the side length.

I'll display some of the word squares and other stats before I display the python code I used to generate all these things further down.

Some example word squares

a z o
n o w
t o n
f u z z
a l o e
g a r s    
s n i t
f r i z z
l a r e e
o d o r s
p i n o t
s i s s y
z y g o t e
y e l p e d
g l u i n g
o p i a t e
t e n t e r
e d g e r s
e s c a p e s
s e a l a n t
c a s e t t e
a l e r t e r
p a t t e r n
e n t e r a l
s t e r n l y

How many word squares are there?

I counted how many word squares there are for each side length. I was suprised how many there are for 4x4 squares, but then there are no 8x8 squares or higher. The No. of words is how many words of that particular length occur in the wordlist.

Side LengthNo. of wordsNo. of squares
31,0142,607,858
43,64190,669,454
56,68310,991,077
69,55968,518
711,96098
812,3170
911,1810
108,760didn't check
116,296didn't check
124,082didn't check
132,493didn't check
141,326didn't check
15720didn't check

Building the wordlist

I identified two wordlists which I might use: the scrabble dictionary, and googles billion word corpus. Unfortunately, the scrabble dictionary has so many weird words I've never heard of before, and google's dataset has a heap of misspellings and gibberish. To get my wordlist I found the intersection of the two, with the hope being that the result will be just reasonably common proper English words.

The code

I used python-constraint to solve the squares. A basic (very slow) implementation is shown below

from constraint import *
problem = Problem()
s = 6
problem.addVariables(range(s*s), list("abcdefghijklmnopqrstuvwxyz"))

# build the wordlist from the wordlist text file
wordlist = {}
f = open("fewer_words.txt")
for line in f:
    w = line.split()[0].lower()
    wordlist[w] = 1

# we need a function: true if word is in wordlist, false if not
# this is characterwise checking, so a sequence of chars is provided, 
# we need to join them into a string before checking if they are in the list
def inlist(*word): 
    w = ''.join(word)
    if w in wordlist: return True
    return False
   
# add row and col constraints
for i in range(0,s): problem.addConstraint(inlist,range(s*i,s*i+s))
for i in range(0,s): problem.addConstraint(inlist,range(i,s*s,s))

# pretty print the output
a = problem.getSolution()
for i in range(s):
    for j in range(s): print a[i*s+j],
    print('')

It can be sped up a lot by adding way more constraints on what bigrams, trigrams etc are possible. I apologise for the scruffy code, I was more concerned with getting it to work efficiently than making it look nice

from constraint import *
from timeit import default_timer as timer
from itertools import combinations

problem = Problem()
s = 3
problem.addVariables(list(range(s*s)), list("abcdefghijklmnopqrstuvwxyz"))

wordlist = {}
f = open("fewer_words.txt")
for line in f:
    w = line.split()[0].lower()
    if len(w) != s: continue 
    wordlist[w] = 1

# build a list of monograms in each position
mono = []
for j in range(s): mono.append({})
for i in wordlist.keys():
    for j in range(s):
        mono[j][i[j]] = 1
    
# build a list of bigrams at each pair of positions
bgpos = list(combinations(list(range(s)),2))
bg = {}
for j in bgpos: bg[j] = {}
for i in wordlist.keys():
    for j in bgpos:
        bg[j][i[j[0]]+i[j[1]]] = 1
        
tripos = list(combinations(list(range(s)),3))
tri = {}
for j in tripos: tri[j] = {}
for i in wordlist.keys():
    for j in tripos:
        tri[j][i[j[0]]+i[j[1]]+i[j[2]]] = 1

qpos = list(combinations(list(range(s)),4))
q = {}
for j in qpos: q[j] = {}
for i in wordlist.keys():
    for j in qpos:
        q[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]] = 1

fpos = list(combinations(list(range(s)),5))
f = {}
for j in fpos: f[j] = {}
for i in wordlist.keys():
    for j in fpos:
        f[j][i[j[0]]+i[j[1]]+i[j[2]]+i[j[3]]+i[j[4]]] = 1

def opposite():
    return lambda x,y: x==y
    
# we need a function: true if word is in wordlist, false if not
def inlist(*word): 
    w = ''.join(word)
    if w in wordlist: return True
    return False
   
def inmono(i):
    return lambda x: x in mono[i]
    
def inbg(i,j):
    return lambda *x: ''.join(x) in bg[(i,j)]   
   
def intri(i,j,k):
    return lambda *x: ''.join(x) in tri[(i,j,k)]   

def inq(i,j,k,l):
    return lambda *x: ''.join(x) in q[(i,j,k,l)]   

def inf(i,j,k,l,m):
    return lambda *x: ''.join(x) in f[(i,j,k,l,m)]   
    
# add row and col constraints
for i in range(0,s): problem.addConstraint(inlist,list(range(s*i,s*i+s)))
for i in range(0,s): problem.addConstraint(inlist,list(range(i,s*s,s)))

# add row monogram constraints
for j in range(s):
    for i in range(s):
        problem.addConstraint(inmono(i),[j*s +i])
        problem.addConstraint(inmono(i),[i*s +j])

#print(bg.keys())
for j in range(s):
    for a,b in bgpos:
        problem.addConstraint(inbg(a,b),[j*s +a,j*s +b])
        problem.addConstraint(inbg(a,b),[a*s +j,b*s +j])

for j in range(s):
    for a,b,c in tripos:
        problem.addConstraint(intri(a,b,c),[j*s +a,j*s +b,j*s +c])
        problem.addConstraint(intri(a,b,c),[a*s +j,b*s +j,c*s +j])

for j in range(s):
    for a,b,c,d in qpos:
        problem.addConstraint(inq(a,b,c,d),[j*s +a,j*s +b,j*s +c,j*s +d])
        problem.addConstraint(inq(a,b,c,d),[a*s +j,b*s +j,c*s +j,d*s +j])

for j in range(s):
    for a,b,c,d,e in fpos:
        problem.addConstraint(inf(a,b,c,d,e),[j*s +a,j*s +b,j*s +c,j*s +d,j*s +e])
        problem.addConstraint(inf(a,b,c,d,e),[a*s +j,b*s +j,c*s +j,d*s +j,e*s +j])

# pretty print the output
start = timer()
for count,a in enumerate(problem.getSolutionIter()):
  if count % 1000 == 0: print 'count:',count   
  for i in range(s):
    for j in range(s): print a[i*s+j],
    print('')
  print('')  
end = timer()

print("solution time = " + str(end-start))
print count

for i in range(s):
    for j in range(s): print a[i*s+j],
    print ''
print ''  

Tuesday, 22 November 2016

Constraint Programming with python-constraint

I have recently been looking at constraint programming and I thought I would post some of my code as I go. This is using the python-constraint package.

Doing things with python-constraint is pretty easy. To constrain a variable to have a certain value, we can use InSetConstraint. To make sure all the variables in a set are different, we use the AllDifferentConstraint. These are enough to solve sudoku.

Sudoku

I don't think Sudoku needs any introduction. The constraints are fairly easy to specify: each row must be all different, each col must be all different and each 3x3 block must be all different. Also we have to ensure that the given starting state is used, for this we use equality constraints. We use 1 variable for each number in the sudoku square, so 81 variables all up.

from constraint import *
problem = Problem()
problem.addVariables(range(9*9), range(1,10))

# specify the given starting sudoku problem here, dots are blank squares
sudoku = '\
.42986.1.\
....2...5\
......82.\
4.1..793.\
.683.254.\
.758..1.2\
.39......\
7...4....\
.1.67325.'

# Make sure our solution matches the specified starting thing
for i,char in enumerate(sudoku):
    if char != '.':  problem.addConstraint(InSetConstraint([int(char)]),[i])
    
# add row and col constraints
for i in range(9): 
    problem.addConstraint(AllDifferentConstraint(),range(9*i,9*i+9))
for i in range(9): 
    problem.addConstraint(AllDifferentConstraint(),range(i,81,9))
# add block constraints
for i in [0,3,6,27,30,33,54,57,60]: 
    ind = []
    for j in range(3):
        for k in range(3):
            ind.append(i + j*9 + k)
    problem.addConstraint(AllDifferentConstraint(),ind)

# pretty print the output
a = problem.getSolution()
for i in range(9):
    for j in range(9): print a[i*9+j],
    print ''

The Golomb Ruler

A bit of info on Golomb Rulers: here. If you know how to improve the code below, leave a comment. Golomb rulers are described by their order, which is the number of marks on their edge. The length of a Golomb ruler is the distance between the outer two marks and represents the longest distance it can measure.

There is no requirement that a Golomb ruler measures all distances up to their length. However, if a ruler does measure all distances, it is classified as a perfect Golomb ruler. There are no perfect Golomb rulers above order 4.

Finally, a Golomb ruler is described as optimal if no shorter ruler of the same order exists. The code below first looks for perfect rulers, then increments length each time until an optimal ruler is found. It starts to slow down by order = 7, it takes a few minutes.

This problem is harder than the sudoku problem, we have to use a set of auxiliary variables to represent the pairwise differences, then apply the AllDifferentConstraint to the auxiliary variables.

from constraint import *
from itertools import combinations

order = 7 # the number of marks on the Golomb ruler
diffs = [] # the set of pairwise differences between marks
for i in combinations(range(order),2): diffs.append(i)
length = len(diffs)

def lessthan(a,b): return a < b
def diffeq(a,b,res): return (b-a)==res     

while True:
    print 'trying length: '+str(length)
    problem = Problem()
    problem.addVariables(range(order), range(0,length+1))
    problem.addVariables(diffs,    range(1,length+1))

    # make sure the first mark is 0 and last is length
    problem.addConstraint(InSetConstraint([0]),[0])
    problem.addConstraint(InSetConstraint([length]),[order-1])

    # ensure that the marks are in increasing order
    # make sure all possible pairwise relations are constrained
    for i in diffs: problem.addConstraint(lessthan,[i[0],i[1]])

    # ensure that the differences reflect the marks
    for i in diffs: problem.addConstraint(diffeq,[i[0],i[1],i])
    
    # all the differences must be different    
    problem.addConstraint(AllDifferentConstraint(),diffs)
   
    # pretty print the output
    solutions = problem.getSolutions()
    for ruler in solutions:
        for mark in range(order): print ruler[mark],
        print ''
    if len(solutions) > 0: break # if we find some solutions, quit
    length = length + 1

example output for order = 7, note the shortest ruler is length 25, though a perfect ruler would be length 21:

trying length: 21
trying length: 22
trying length: 23
trying length: 24
trying length: 25
0 4 9 15 22 23 25
0 3 4 12 18 23 25
0 2 3 10 16 21 25
0 2 5 14 18 24 25
0 2 6 9 14 24 25
0 2 7 13 21 22 25
0 2 7 15 21 24 25
0 1 4 10 18 23 25
0 1 7 11 20 23 25
0 1 11 16 19 23 25

Magic Squares

Magic Squares seem to be very common examples when doing constraint programming, so I may as well include the code for them. Info on Magic Squares can be found here. This code is quite slow, I thought it would be much faster. This may be because python-constraint is just kind of slow, or maybe the problem is harder than I thought.

from constraint import *

problem = Problem()
size = 4
problem.addVariables(range(size*size), range(1,size*size+1))
total = size*(size*size + 1)/2

# make sure rows add to total
for i in range(size): 
    problem.addConstraint(ExactSumConstraint(total),range(i*size,i*size+size))
for i in range(size): 
    problem.addConstraint(ExactSumConstraint(total),range(i,size*size,size))
# add the diagonal constraints
diag1,diag2 = [],[]
for i in range(size):
    diag1.append(i*size + i)
    diag2.append(i*size + (size-i-1))

problem.addConstraint(ExactSumConstraint(total),diag1)
problem.addConstraint(ExactSumConstraint(total),diag2)
problem.addConstraint(AllDifferentConstraint())
   
# pretty print the output
solution = problem.getSolution()
for i in range(size):
    for j in range(size):
        print solution[i*size + j],
    print ''

Monday, 14 November 2016

Terraforming Venus

I thought I would look at some of the figures involved in terraforming Venus, discussions of which can be found all over the place e.g. here. The numbers from this page come from here. Venus is the planet next to earth as you go towards the Sun. It has a year of 224.7 Earth days, and a day length of 243 Earth days (longer than the year!). It also rotates in the opposite direction to most other planets, likely due to some large collision early in its life.

I'll quote a little bit from wikipedia: Venus has an extremely dense atmosphere composed of 96.5% carbon dioxide, 3.5% nitrogen, and traces of other gases, most notably sulfur dioxide. The mass of its atmosphere is 93 times that of Earth's, whereas the pressure at its surface is about 92 times that at Earth's—a pressure equivalent to that at a depth of nearly 1 kilometre under Earth's oceans. The density at the surface is 65 kg/m3, 6.5% that of water or 50 times as dense as Earth's atmosphere at 20 °C at sea level. The CO2-rich atmosphere generates the strongest greenhouse effect in the Solar System, creating surface temperatures of at least 735 K (462 °C).

Terraforming Venus would involve removing most of the sulfuric acid and carbon dioxide, needless to say this is a monumental task that we don't currently have the technology for. Some of the truly staggering figures:

Total Mass of Venus Atmosphere = 4.80E+20 kg
Percent Atmosphere CO2 = 96.50%
Total Mass of CO2 = 4.63E+20 kg
For comparison:
Total Mass of Earth's Atmosphere = 5.10E+18 kg
Mass Ratio Venus to Earth = 94.118

Utilizing the Bosch reaction (CO2 + 2H2 -> C + 2H2O) , combining hydrogen with carbon dioxide to make carbon graphite and water could be used to remove Venus' carbon dioxide. This reaction requires the introduction of iron, cobalt or nickel as a catalyst and requires a temperature level of 530-730 degrees Celsius. Since the reaction releases some heat, it could be self sustaining if you could supply enough hydrogen. A problem with this is that that the production of elemental carbon tends to foul the catalyst's surface, which is detrimental to the reaction's efficiency [ref].

Molecular Weight of CO2 = 44
Molecular Weight of 2*H2 = 4
Total Initial Molecular Weight = 48
Molecular Weight of C (graphite) = 12
Molecular Weight of 2*H2O = 36
Total Final Molecular Weight = 48

You could use water from e.g. comets to get hydrogen for the reaction. You would have to use Electrolysis to split the water and initiate the reaction. Alternatively you could reuse the water output from the Bosch reaction , recover the Hydrogen and continue. But this way you wouldn't have oceans on Venus at the end.

If you were to provide hydrogen for the reaction (4E19 kg), this massive reaction would create an ocean nearly as large as 1/4 of the Earth’s ocean:

Ratio of 2*H2O to CO2 (36 / 44) = 0.818
Resultant Mass of H2O = 3.79E+20 kg
Density of H2O = 1.000 g / cm^3
= 1,000,000.000 g / M^3
= 1,000.000 kg / M^3
Volume of Resultant H2O = 3.79E+17 M^3
= 3.79E+08 kM^3
Area of Venus Surface = 4.602E+08 kM^2
Average Depth of H2O = 0.824 kM
Total Volume of Earth's Oceans = 1.300E+09 kM^3
Average Depth of Earth's Oceans = 3.682 kM

It would also result in the deposition of a layer of graphite with an average thickness over the entire surface of Venus roughly equal to a 40 story building:

Ratio of graphite to CO2 (12 / 44) = 0.273
Resultant Mass of graphite = 1.26E+20 kg
Density of C (graphite) = 2.230 g / cm^3
= 2,230,000.000 g / M^3
= 2,230.000 kg / M^3
Volume of Resultant C (graphite) = 5.66E+16 M^3
= 5.66E+07 kM^3
Area of Venus Surface = 4.602E+08 kM^2
Average Depth of C (graphite) = 0.123 kM

So after converting all the carbon dioxide to graphite, you would be left with ~120 meter thick blanket of carbon over the planet. Though this may not be terrible, carbon makes a pretty good base for soils. There would still be a great deal of atmospheric pressure due to all the oxygen though, and a spark would result in a mighty conflagration that would undo all your work. A better idea may be to trap the carbon dioxide as carbonate rocks, but for this you would need large quantities of calcium and/or magnesium (About 8×10^20 kg of calcium or 5×10^20 kg of magnesium would be required). This would have the advantage of removing a lot of the oxygen, which would lower the atmospheric pressure. Unfortunately Magnesium carbonate begins to decompose at 350 degrees C, so Calcium might be a better bet. You would also need to remove all the sulphuric acid since it would dissolve all your carbonate rocks, using up the sulphuric acid but producing more carbon dioxide.

Of course you have to power all this as well, electrolysis doesn't come cheap. Solar panels can work well on Venus because the sunlight is 4 times stronger. Perhaps some genetically engineered plant...