Intro

I put the code for this on github, it's a bit messy but works: https://github.com/jameslyons/crossword_gen.

In previous posts I have been playing with python-constraint, and building crosswords is something that interests me for some reason. I decided to do it in a two step process: step 1 generate the patterns of white and black squares, step 2 fill the patterns with letters. This post will deal with step 1 only, the next post will deal with step 2.

Generating the patterns is well suited to using a constraint solver since you can just specify the things you'd like to be true of the final result, then ask the constraint solver to do it for you. In this case we have e.g. 15 by 15 binary variables, we choose the constraints so that the final pattern looks like a crossword puzzle.

What do pretty patterns look like

I am interested in automatically generating patterns that look like nice crossword patterns e.g.

Some of the things I'd like to be true of the patterns I generate:

I'd like 4-fold symmetry, because I think it looks good.
No 2x2 blocks of white squares.
No 2x2 blocks of black squares.
No length 1 words.
No length 2 words.
No words longer than MAXLEN.
No sequences of black squares longer than 3.
White squares must be fully connected.

Some of these conditions are pretty easy to implement with python-constraint, the fully connected constraint I can just apply to the final solutions and discard them at that point. By fully connected I mean you can get from one square to every other square in the pattern, so you can't get disjoint islands.

The implementation

Examples of what the program outputs:


3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg
X X . . . . . X . . . . . X X
. . . X X X . X . X X X . . .
. X . X . . . . . . . X . X .
. . . . X . X . X . X . . . .
. X X X . . . . . . . X X X .
. X . . . X . X . X . . . X .
. . . X . . . . . . . X . . .
X X . . . X . X . X . . . X X
. . . X . . . . . . . X . . .
. X . . . X . X . X . . . X .
. X X X . . . . . . . X X X .
. . . . X . X . X . X . . . .
. X . X . . . . . . . X . X .
. . . X X X . X . X X X . . .
X X . . . . . X . . . . . X X

The hex string is the binary to hex encoding of the pattern, 'X's are zeros, '.'s are ones. The hex string will be passed to step 2 which will fill in the words using a wordlist.

It's pretty easy to generate a few million example patterns with this program (if you are willing to wait a day), which is what I suggest doing. Once you have a heap of patterns you can further filter them by whatever criteria you want e.g. no more than X 3-letter words, no more than X% black squares etc. Once you have the filtered list, you can try filling them with words.


from __future__ import print_function
from constraint import *
from itertools import combinations
from math import sqrt,floor

# this program will create crossword patterns, but not fill it with words
# i.e. just patterns of zeros and ones for the black and white squares

# tests if a solution is fully connected i.e. can every point be reached from
# any random point. True if fullyconnected, false if disjoint parts.
# Could easily be made more efficient, but this works fine.
def isfullyconnected(solution):
    L = int(sqrt(len(solution)))
    # get list of all non-zero points
    plist = []
    for i in range(L):
        for j in range(L):
            if solution[i*L + j]: plist.append((i,j))
    
    # start top right corner, try to get to every other point
    reached = {plist[0]:1}
    rlen = len(reached)
    oldrlen = 0
    while rlen != oldrlen:
      temp = list(reached.keys())
      for i,j in temp:
        if (i+1,j) in plist: reached[(i+1,j)]=1
        if (i-1,j) in plist: reached[(i-1,j)]=1
        if (i,j+1) in plist: reached[(i,j+1)]=1
        if (i,j-1) in plist: reached[(i,j-1)]=1
      oldrlen = rlen
      rlen = len(reached)
    # if rlen is not eq number of all points, we couldn't reach some points
    if rlen < len(plist): return False
    return True    
    
from random import randint
    
def getPattern(sidelen=15,maxlen=9):
    problem = Problem()
    s = sidelen
    maxWordLen = maxlen
    problem.addVariables(list(range(s*s)), [0,1])
  
    # ensure there are no 2x2 blocks of ones or zeros
    def no4x4block(*x):
        if sum(x)==0: return False
        if sum(x)==len(x): return False
        else: return True

    for i in range(s-1):
        for j in range(s-1):
            problem.addConstraint(no4x4block,[i*s+j,(i*s+j)+1,(i*s+j)+s,(i*s+j)+s+1])
    
    #ensure matrix is symmetric  
    def eq(a,b):  
        return a==b

    for i in range(s):
        for j in range(s):
            problem.addConstraint(eq,[i*s+j,(s-i-1)*s+j])
            problem.addConstraint(eq,[i*s+j,i*s+(s-j-1)])
            problem.addConstraint(eq,[i*s+j,(s-i-1)*s+(s-j-1)])
       
    #ensure no words longer than maxWordLen
    def maxlen(*x):
        if sum(x) == len(x): return False
        else: return True

    seq = list(range(maxWordLen+1))
    for i in range(s - maxWordLen):
        for j in range(s):
            problem.addConstraint(maxlen,[(k+i)+j*s for k in seq])
            problem.addConstraint(maxlen,[(k+i)*s+j for k in seq])
    
    # no two letter words
    def no2letter(*x):
        if x == (0,1,1,0): return False
        else: return True
    
    seq = list(range(4))
    for i in range(s - 3):
        for j in range(s):
            problem.addConstraint(no2letter,[(k+i)+j*s for k in seq])
            problem.addConstraint(no2letter,[(k+i)*s+j for k in seq])

    # no two letter words around the edge either
    def no2letterEdge(*x):
        if x == (1,1,0): return False
        else: return True
    
    seq = list(range(4))
    for i in range(s):
        problem.addConstraint(no2letterEdge,[i,i+s,i+2*s])
        problem.addConstraint(no2letterEdge,[i*s,i*s+1,i*s+2])
        problem.addConstraint(no2letterEdge,[i*s+s-1,i*s+s-2,i*s+s-3])
        problem.addConstraint(no2letterEdge,[i+s*(s-1),i+s*(s-2),i+s*(s-3)])
    
    # no one letter squares
    def no1letter(*x):    
        if sum(x) == 0: return True
        if sum(x[1:]) == 0: return False
        return True
    
    seq = list(range(4))
    for i in range(s):
      for j in range(s):
        vals = [i*s+j]
        if i-1 >= 0: vals.append((i-1)*s+j) 
        if i+1 < s: vals.append((i+1)*s+j) 
        if j-1 >= 0: vals.append(i*s+(j-1)) 
        if j+1 < s: vals.append(i*s+(j+1)) 
        problem.addConstraint(no1letter,vals)
 
    # no more than three 0's in a row
    def nothreeblank(*x):
        if sum(x) == 0: return False
        return True
    
    seq = list(range(4))
    for i in range(s - 3):
        for j in range(s):
            problem.addConstraint(nothreeblank,[(k+i)+j*s for k in seq])
            problem.addConstraint(nothreeblank,[(k+i)*s+j for k in seq])
   
    for sol in problem.getSolutionIter():  
        if not isfullyconnected(sol): continue
        yield(sol)
    
from test_bin2hex import bin2hex
import sys

print(len(sys.argv))
if len(sys.argv)<2: sidelen = 11
else: sidelen = int(sys.argv[1])
print('using sidelen',sidelen)

f = open('pattern'+str(sidelen)+'.txt','w')
for count,sol in enumerate(getPattern(sidelen=sidelen,maxlen=8)):
    f.write(bin2hex(''.join([str(sol[a]) for a in range(sidelen*sidelen)]))+'\n')
    if count % 100 == 0: 
      print('count: '+str(count))
      print(bin2hex(''.join([str(sol[a]) for a in range(sidelen*sidelen)])))
      for i in range(sidelen):
        for j in range(sidelen): 
            if sol[i*sidelen+j]==1: print('.',end=' ')
            else: print('X',end=' ')
        print('')
      print('')
    #break

The hex string is generated using the following python script (test_bin2hex.py). The extra letters are so we can encode binary strings that aren't necessarily divisible by 4.



bin = ['0000','0001','0010','0011','0100','0101','0110','0111','1000','1001','1010','1011','1100','1101','1110','1111',
       '0','1','00','01','10','11','000','001','010','011','100','101','110','111']
hex = "0123456789ABCDEFghijklmnopqrst"

def bin2hex(binstr):
    map = dict(zip(bin,hex))
    ret = ''
    for i in range(0,len(binstr),4):
        sub = binstr[i:i+4]
        ret += map[sub]
    return ret
    
def hex2bin(hexstr):
    map = dict(zip(hex,bin))
    ret = ''
    for i in hexstr:
        ret += map[i]
    return ret

jimblog

Pages

Sunday, 15 January 2017

Creating Crosswords with python-constraint [part 1]

Intro

What do pretty patterns look like

The implementation

1 comment: