Generating Sudoku Variations

Category: Publishing blog

Generating and solving Sudoku puzzles often uses a programming technique developed in the 1960s known as ‘dancing links’. Conceptually, the technique creates a large set of ‘constraints’. As an example, ‘the top row contains the digit one in one of those nine cells’ is one constraint, ‘the top row contains the digit two in one of those nine cells’ and so on for each digit in each row, column and 3x3 box. The final set of constraints is that each cell contain a digit. The program then makes an assumption and sees if that leads to each constraint being true when all the digits are filled in. If the assumption leads to a problem, found when a constraint is no longer possible, then that assumption is discarded and another one tries.

Dancing links worked fine for Trividokus since we just had to add another constraint for the Sudoku cells. Sudoku Islands also worked since each island was also just another set of nine constraints.

With Stars & Stripes Sudoku, we went from 9x9 to 12x12, where you have the digits 1-9 and three stars. This was a little trickier, but not too bad. The new constraints were now ‘the top row contains THREE stars’, a slight change from a single digit, but it still was within the framework.

In Sudoku Atlantis, we had to switch to a new technique. Some of the new cell requirements – even, odd, greater than 6, less than 5 and so on could have been processed. We just had to remove the invalid digits, but our ‘magic 5’ requirement made it ugly. With the magic 5, any space with a five filled in did not ‘allow’ larger numbers immediately above it in the grid. Since this requirement was not static, it only came into being when a ‘5’ was filled in, there was sensible constraint for it.

Keeping the concept of constraints, we inverted the process. Each cell in the grid was now attached to a list of constraints. For example, the upper left cell is attached to three ‘Standard9’ constraints. Each of these constraints is tied to nine grid cells and when asked to fill in a digit, goes into the grid and ‘writes in’ the digit, then erases that as a possibility for the remaining eight spaces. If while doing this, it finds a space can no longer have any digits, then we know there’s a problem with this assumption and we undo our changes and try something else. If erasing a digit leaves exactly one possibility, then we recursively treat that as a new assumption and continue from there.

In addition to the standard nine digit constraints, we then add in our variants. A space that only allows even digits has a very simple constraint. If you try to fill in an odd digit it returns a fail. It’s actually a little more efficient than that, but that’s the gist of it. For a constraint like the ‘magic 5’ we put in for Atlantis, every cell has that constraint which ‘know’ the adjacent spaces and, if a 5 is filled in, we check above; likewise, if a 6-9 is filled in, we check below.

From our perspective, generating, solving and validating puzzles are all based on the same process. When generating, first set up a grid and say that it’s possible for each cell to have the digits 1-9 in it. Next, we apply the constraints for each cell. For variants, we use an app to lay out the constraints for the puzzle. Maybe we want these three spaces to have only even digits and these three to have only odd. These constraints, when applied, remove the odd/even digits as possibilities from the cells they’re in.

Generating a puzzle always starts by picking a random spot and randomly selecting one of the possible digits. When we put that digit in, we then remove that digit as a possibility from the other cells in that row, column and box, as well as anything else the constraints for that cell may require.

We then pick another cell and try one of the possible digits there. If it leads to a contradiction, that is some cell now has all possible digits erased, we know that digit can’t be placed there so we try another. If we run out of digits to try, we then know the previous assumption was wrong. This process continues until we’ve placed digits in every space in the grid. Normally, this process is very fast, although there have been times when we’ve messed things up in the layout and the puzzle is impossible to generate. As an example, if we accidentally put five ‘even only’ constraints in a column, the puzzle is impossible to generate.

Once we have a puzzle generated, it’s time to start erasing. The goal here is to erase enough digits so that the puzzle still has exactly one solution – and isn’t too incredibly hard. A normal Sudoku can go down to about sixteen ‘givens’ (the digits pre-printed in the puzzle) before it becomes ambiguous. Our puzzles can get down to eight digits, although they’re likely to be too difficult to be any fun to solve.

The process for erasing is always the same. We randomly pick a space and erase it, then feed the resultant puzzle into our solver. The solver then uses the constraints and finds ALL possible solutions for the puzzle. If it finds more than one solution, then we know that erasing the cell made the puzzle ambiguous. Ambiguous is bad – when you solve a puzzle the solution you get should always agree with the one in the back of the book.

As a side effect of the solving, we also determine how hard the puzzle is. When erased, we go through and reduce the cells to having exactly the possible set of digits that can go into the cell. We apply a formula to these cells. A cell that has only one possibility (even though it’s erased) has a very small weight towards the difficulty; a cell that has nine possibilities puts in a very large weight. Since we prefer to have the ‘easier’ puzzles first in the book, ending up with the toughest, this gives us a way to measure that.

Once the puzzle is done and saved into our database, it’s time to publish. Our constraints play another part here, since the other effect the interesting variants have – such as even and odd – is that we usually drop a graphic into the cell effected. The ‘even’ constraint drops a faded “E” into the even cells, ‘odd’ drops a faded “O” into the cell.

Before we publish, though, we run a validation step. While it’s ‘impossible’ for bad puzzles to be generated, we still do another check of every puzzle. Why? Because sometimes coding changes are made here and there that have unexpected side effects.

Some people call them ‘bugs’, but side effects just sounds better.

 

 

The making of a Cryptic Word Search - Part 4

Category: Publishing blog

Publishing our puzzles involves yet another aspect of our app. The goal is to create a PDF file that we can upload for publishing. We use Adobe InDesign, a layout program we’ve used for books since around 2005.

The more recent riddle books we’ve published under the Cloud Kingdom Games imprint, as well as all of Sudoku-USA’s Sudoku books have used InDesign. While the older books were generated with copy and paste, starting in 2005, we began using InDesign’s automation features.

The first, and most time consuming part for the books is creating the template document. This is essentially a desktop publishing document with the introduction, page numbering, chapter headers and what-not in place, together with essentially blank pages associated with templates known as ‘master pages’. We use two main ones, one for puzzle pages, where the word searches are placed, and another template for the answer pages.

To get the puzzles onto the pages, we make use of JavaScript, the same coding language used in just about every web site on the internet. InDesign can run JavaScript and so our app takes a puzzle and creates Typescript (a newer form of JavaScript) to tell InDesign how to publish the puzzles. Effectively, we tell InDesign to go to page xxx, find the spot on the page we want the puzzle to be on, then create the word grid. Then go to the spot for the answers and fill them in.

Our Cryptic Wordsearch #2 book’s script was about 13,000 lines long, only about 500 hundred of which were hand generated: the functions written to do the placement. The remaining lines of code were generated by our publishing app from other templates. For each puzzle, we generated a new “program” from the template which made a bunch of calls to functions, each of which was designed to do one particular part of the publishing process.

When the script finishes, we have a draft of the book. Yay! Except now we get to see what went wrong. The first draft is always fairly iffy. The templates for puzzles are generic, good for a puzzle of average size. If the puzzle is larger or the clues are too wordy, things don’t fit well and we often need to shrink or grow the space allocated for the puzzle, or perhaps change the font size for the answers, maybe make a column wider or … we play with the document until things look good. To make things even more fun, as a general rule, we try to make formatting changes only to the ‘base document’, the one with the layout, but empty space for the puzzles. So it’s an iterative process: run the script against the base document to make a draft, review the draft for problems (or just aesthetics), modify the base, run the script again. 

But then, sooner or later, it’s time to upload the PDF and discover we’ve overlooked something that the publisher (Amazon Create Space) doesn’t like about the PDF. A few years ago, their errors were something like ‘there are characters beyond the printable margin’, leading to the fun game of ‘guess what they’re thinking’. These days, they at least tell you what page the problem is on.

With the contents ready and a cover created, it’s time to publish!

The making of a Cryptic Word Search - Part 3

Category: Publishing blog

Once the word list for the puzzle has been entered, it’s time to generate the puzzle.

I’ve been programming since personal computers were assembled from kits, so the code for the app I use is all homemade. The program has been through many iterations since I first started developing these word searches, changing each time new variations were added. Each puzzle can either be a vanilla word search: words are placed normally or can have special code for a variation ‘injected’ into the generation and publishing code.

We first start by creating an empty grid of the size we want the puzzle to be. We normally start with a 15 x 15 grid, since that’s big enough for most words to fit.

The program than takes the longest word and tries to fit it into the grid.

To optimize this process, we come up with all possible placements for a word of that length. For example, if the word is ten letters long, then there are five ways in a 15 x 15 to place it going left to right in each row and five going right to left. Similarly, there are five ways to place it top down in a column, and five ways to place it bottom up. The diagonals are similar, but there are fewer placements since we need to have at least ten spaces.

With this placement figured out, we then choose one randomly and put the word in.

We then continue the process with the remaining words. The program makes a list of the possible placements and then rejects any where the wrong letters cross. So if a word has an “E” and the grid already has a “D” there, that is not a possible placement.

After discarding the impossibly placements, we then try to pick the placement which overlaps the most other words already placed in the grid. The intention is to make the puzzle as “dense” as possible, with as many letters overlapping as we can. There’s one caveat to that: we don’t want a word to overlap more than one letter of a different word. So if we have the words “GOLDEN” and “ENTANGLE”, we don’t want the “EN” to overlap. We could do that, but it’s easier for the puzzle solver if they don’t.

While we’re doing our insertions, we’re also trying to keep the different placements fairly balanced. That is, if there are six answers appearing left to right in a row, we also want about six going right to left, six going top to bottom, six bottom up and six each on the four diagonals. With 30 answers going in, it’s not possible to have them exactly equal, so we shoot for the count of the fewest in any of the eight directions being no less than two from the most found in the other directions.

If we’re lucky, we make it to the end and we’ve fit all the words in. Usually, when trying to make a tight puzzle, one without too many extra spaces, we discover after putting 25 or so words in that the remaining words can’t be placed. In this case, we pop backwards and try different placements for words. Since we’re choosing our placements randomly, this can make a puzzle take a long time to generate. If it continues to fail, we’ll often stop the process and add a row or column to the puzzle and try again.

Once the puzzle has been generated, it’s time to put in the phrase. In most cases, we start off the puzzle without the hidden phrase in place so we can determine how many empty spaces are in the puzzle where we can put the phrase. If there are too many spaces, that means the grid is too large and we’ll cut off a row or column and generate it again.

When the puzzle has a decent number of blank spaces, usually around 60 characters, we’ll play around with our intended hidden message, rephrasing it until it’s the right length. At times, we can’t get the perfect length, so we’ll regenerate the word search grid so it has a different number of empty spaces.

With the phrase in place, the next step is to validate the puzzle. Using the same placements we used for putting words into the grid, we now go back and make sure that every answer appears exactly once in the puzzle. The puzzle generator should have done a good job of this, but it’s possible that when the phrase was inserted, we inadvertently put in letters which made the same word appear more than once.

Now that the puzzle has been generated and validated, it goes into the database, ready for the publishing step.

 

The making of a Cryptic Word Search - part 2

Category: Publishing blog

Part 2

Once I have a word list for my Cryptic Wordsearch puzzle, it’s time to get working on the title and clues. The title is another cryptic clue, this time one which introduces the theme of the puzzle. For some puzzles, I’ve been told that once you figure out the title, things start making more sense. As an example, if the theme was “bread”, then you start thinking in certain ways. “Rye”, “Sourdough”, “Sliced” and other answers would be easier to see. Of course, “Money” might also be an answer…

Hopefully, when I start on the answers for the puzzle itself, I’ll have plenty to pick from. If it’s only 30 answers (or worse, less), then it gets more challenging. My first pass through is to look for what are often the hardest to come up with from a themed list, double definitions and sounds-like (homophones).

Double definitions are always nice because they often lend themselves to puns. “Pirate digs up his buried torso?” (5)

Homophones are usually hardest to find because, again, I’m constrained by the theme. If I only have 30-40 words, chances are none of them will be a stand-alone homophone.

If I’m lucky I’ll find a way to use some combination of double definitions and homophones together with other wordplay, but overall I’ve discovered that those are the hardest words to pull out of most themes.

On the other hand, some of the themes we use are more general – “starts with the letter ‘s’”, could be one, in which case life becomes easier.

After the first pass, it’s time to go back and look for the more general wordplay, anagram and hidden word clues. Hidden words are generally the easiest, at least in a forward direction. Which I’d deny discovering most of the time (6).  Reversed hidden words are the next thing to check for, and for both of these, I often use a program I’ve created (and rewritten a few times).

The first puzzle books Sudoku-USA published were – surprise, surprise – Sudoku books, but using that program framework I added an ‘anagram’ feature. I went out and got a particularly bad list of dictionary words and wrote code that let me put in a word or sentence and it would come up with possible anagrams. The dictionary, which contained not only English but random foreign words – as well as misspellings – let me at least get ideas for anagrams.

Over time, I expanded the anagram feature to do more general lookups, many of which are important for hidden words. I added the ability to look for ‘words starting with …’, which helps me when I have the first few letters and want to get the next. I also have ‘ends with …’ which helps with the leading portion. For reversed hidden words, I just turn things around and it all works.

As to general word play, cutting and pasting, inserting and all the other tricks, I still use my anagram window. One of its other important features is that if I put in the answer and below it the letters comprising the results of the clue, it will tell me if they match – in any order.

Let’s say the answer is “implied”. I put that in the window and then say that the clue components are “I’m piled” (You might think I’m piled messily) and it will tell me that ‘impiled’ are all part of the clue. That’s not a very good clue, way too easy, but it’s the kind of thing my program is good for recognizing.

When I’ve got my 30 candidate clues, it’s time to start entering them in the database. I create a puzzle “shell”, a place to put the clues, and then start entering them. As I do, the program checks to see if I’ve ever used that answer – or even one similar – in a previous puzzle. As a rule, we don’t like having the same answer appear in the same book, and also check against previous books and sample puzzles. If you’ve solved as many cryptic crosswords as I have, you’ve probably seen ones you KNOW you’ve seen before. My goal is to prevent that from happening. If I do reuse an answer, the clue will be significantly different.

Once the clues are entered it’s time to generate the puzzle. I’ll cover that in Part 3.

The making of a Cryptic Word Search - part 1

Category: Publishing blog

Part 1

A cryptic word search is a combination between a Cryptic Crossword and a Word Search puzzle.

Typically, a cryptic crossword is in a lattice style grid, with lots of individual black spaces separating the cross and down answers, unlike most US "normal" crosswords where words intersect more often. Clues for a cryptic crossword have two parts, a wordplay portion and a definition. When solved, the answers are entered into a grid as you would with an ordinary crossword, with intersecting letters providing a guide for if you're right or wrong.

Standard word search puzzles provide a filled-in grid of letters together with a list of words - usually all part of some theme - that you then circle. In many word searches, the letters that are not circled then spell out a "secret message".

Our cryptic word-searches combine these two concepts. Instead of providing a list of words, we provide a list of cryptic crossword style clues. When you solve the clue, you then search through the word grid looking for the answer, which may appear left to right, right to left, top to bottom, bottom to top or in any of the four diagonal directions. We also place a secret message in the grid, however unlike most word searches where you read the message left to right, top to bottom, ours may be inserted - otherwise.

Most cryptic word-searches are themed, and the title, presented as another cryptic clue, provides a hint to that theme.

Coming up with new and interesting themes can be a bit of a challenge. For the most part, we prefer not to repeat themes from book to book, so each new book needs 40 new themes. There's also the problem that a theme can be too restrictive, that it can make the answers too similar. For example, let's say the theme was "colors". We could get away with "ruby" as a gem (and a color), "ebony" as a magazine, "green" as a vegetable, but since our standard puzzles contain 30 different answers, we'd run out of decent definitions (and probably colors) far too soon.

Ideally, we come up with a list of 50+ answers for a theme. That allows us to go through them and find answers that lend themselves to wordplay, puns, double meanings and all the others tricks we use. 

In part 2, I'll discuss the tech we use with our puzzles, technology that allows us to develop the best clues and with the fewest repeated answers (and sometimes clues), unlike many of the cryptic crossword books currently in print.