In the previous post I was generating the patterns of white and black squares. In this post I'd like to actually fill the patterns with words so that everything matches up. This sort of problem is very well suited to a constraint satisfaction solver, as it can try many possibilities until something is found that works. The trick is specifying the constraints in an efficient manner. My first solution was to implement it the same way as the magic word squares, but this was really inefficient and there is a much better way.
I put the code for this on github, it's a bit messy but works: https://github.com/jameslyons/crossword_gen.
The trick is you don't need to represent all the different squares in the crossword as separate variables and put the constraints on those, your actual variables will be indexes into the word list, and the constraints will be whether e.g. character 4 of word 1 is the same as character 2 of word 7. These constraints will obviously be different for every crossword and will have to be generated each time, but this isn't too hard.
The inputs from the previous step are hex strings e.g.
This can be converted to a binary string by replacing e.g. 3 with '0011', same as decoding hex anywhere. The function I use is down the bottom of the previous post. This gives a binary string that can be represented as the white and black squares of the crossword.
The code takes as argument the hex representation of the pattern. It first prints a representation of the crossword with 'X's where the letters go, then once it gets solved it outputs the solved crossword, along with the list of words that got used. The dictionary I am using has some weird words in it, so you need to pay some attention to that if you want nice crosswords (I recommend using something like the 3of6game.txt from the 12dicts wordlist, it is specifically for games). The final thing that gets printed is the crossword represented as a string, this is passed to some typesetting code I have that produces nice pdfs of the crosswords.
>pypy cw_fill.py BEFB4516BFAFAAF8FE27573BFBDD5DEFEE75723F8FAAFAFEB4516FBEh building word lists... X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X starting... count: 0 s . c r u m b . r e m a n . b w . r . . . r . i . . . o . r a . a . p h o e b u s . s . i m o w n . e . l . r . g y m s i . . . c r i m i n e . . . k . . p o o . n . n . v e g . . a v o . i n f a u n a . i l k s . s u n . l . t . d i g . i p h i . a l a n i n e . o f t . . t a g . m . l . r a t . . i . . . e y e l e s s . . . o m o t s . e . o . p . w a t t i . o . s t o p g a p . m . t d . r . . . a . e . . . i . a e . c l a c k . l u l l s . r ['swami', 'asp', 'imide', 'craw', 'posit', 'torc', 'coinage', 'her', 'yet', 'bro', 'inflame', 'oak', 'elm', 'lop', 'rib', 'inutile', 'gel', 'urn', 'spa', 'evaders', 'nosy', 'gigot', 'amis', 'brisk', 'kit', 'ottar', 'crumb', 'reman', 'phoebus', 'mown', 'gyms', 'crimine', 'poo', 'veg', 'avo', 'infauna', 'ilk', 'sun', 'dig', 'phi', 'alanine', 'oft', 'tag', 'rat', 'eyeless', 'mots', 'watt', 'stopgap', 'clack', 'lulls'] solution time = 1.05939997405 s.crumb.reman.bw.r...r.i...o.ra.a.phoebus.s.imown.e.l.r.gymsi...crimine...k..p oo.n.n.veg..avo.infauna.ilks.sun.l.t.dig.iphi.alanine.oft..tag.m.l.rat..i...ey eless...omots.e.o.p.watti.o.stopgap.m.td.r...a.e...i.ae.clack.lulls.r
This takes the hex string output from the pattern generation step as an argument. It uses a wordlist called 'fewerwords.txt'. Each time it is run it should give different results because the word indexes are shuffled each time. Sometimes it will take a long time to find a solution, I suggest just killing it and running it again. Most of the time this will get a new starting point that allows it to be solved very quickly, though some patterns are just harder to solve than others.
from __future__ import print_function from constraint import * from timeit import default_timer as timer from itertools import combinations from math import sqrt from test_bin2hex import hex2bin import sys ''' to run from pypy pypy cw_fill.py 3EF9C51EBFAFAAF8FE37577BFB9D5CEFEF75763F8FAAFAFEBC51CFBEg ''' # this program will take a crossword pattern and try to fill it with words if len(sys.argv) < 2: print('supply pattern as argument') else: pattern = hex2bin(sys.argv) pattern = [int(i) for i in pattern] s = int(sqrt(len(pattern))) print('building word lists...') # extract the word positions from the pattern wordlocs =  hwordlocs =  vwordlocs =  for i in range(s): hword =  vword =  for j in range(s): if pattern[i*s + j] == 0: print(' ',end=' ') else: print('X',end=' ') if pattern[i*s + j] == 1: hword.append(i*s + j) else: if len(hword) > 1: hwordlocs.append(hword) hword =  if pattern[j*s + i] == 1: vword.append(j*s + i) else: if len(vword) > 1: vwordlocs.append(vword) vword =  if j == s-1: if len(vword) > 1 and vword not in vwordlocs: vwordlocs.append(vword) if len(hword) > 1 and hword not in hwordlocs: hwordlocs.append(hword) print('') wordlocs.extend(vwordlocs) wordlocs.extend(hwordlocs) # now that we have the words, add constraints so each word is in the dictionary maxL = 10 wordlist =  f = open("fewwords.txt") for line in f: w = line.split().lower() wordlist.append(w) wordlen =  olen = 0 for c,word in enumerate(wordlist): if olen < len(word): wordlen.append(c) olen = len(word) problem = Problem() from random import shuffle for c,w in enumerate(wordlocs): L = len(w)-3 r = list(range(wordlen[L],wordlen[L+1])) shuffle(r) problem.addVariables([c], list(r)) problem.addConstraint(AllDifferentConstraint()) def eq(a,b): def temp(c1,c2): word1 = wordlist[c1] word2 = wordlist[c2] if word1[a]==word2[b]: return True return False return temp for c1,word1 in enumerate(wordlocs): for c2,word2 in enumerate(wordlocs): if word1==word2: continue intsc = list(set(word1).intersection(set(word2))) if len(intsc) > 0: a = word1.index(intsc) b = word2.index(intsc) problem.addConstraint(eq(a,b),[c1,c2]) print('starting...') def getwords(solution): listq =  for locs in wordlocs: word = ''.join([solution[loc] for loc in locs]) listq.append(word) return listq # pretty print the output start = timer() b = '' for count,a in enumerate(problem.getSolutionIter()): print('count: '+str(count)) sol = ['.' for i in range(s*s)] for key,val in a.iteritems(): for c,char in enumerate(wordlist[val]): sol[wordlocs[key][c]] = char b=sol for i in range(s): for j in range(s): print(sol[i*s+j],end=' ') print('') print('') # get the words we have used zz = getwords(sol) print(zz) break end = timer() print("solution time = " + str(end-start)) print(''.join([str(b[i]) for i in range(s*s)]))