This is my research report. I've included a lot of the code and chat interactions for people to read through if interested. I worked on this crossword https://www.theguardian.com/crosswords/quick/16553
I had a vision for a GPT powered crossword solver. My idea is that it would do a tree search over GPT generated guesses that would include the knowns so far, like:
I didn't end up doing that because ChatGPT and GPT-4 are terrible at questions involving the length of words, or guessing words that contain specific letters at specific locations. It can sometimes do them but usually fails. I think this is because it's token based. I am curious whether a character based LLM would be better at such tasks.
Here's a couple examples of GPT-4 trying to guess from clues plus givens:
(Thanks to nat.dev for the GPT-4 access)
Here's a fun asciinema to watch of the partial crossword solver script: https://asciinema.org/a/oEURFRS5BfVHEKN9GjjW13HKe
I started working on this without having realized that though. The first thing I did was generate a program to split a grid up into parse and turn that into a python list of lists. I tried to use tesseract OCR to recognize the numbers but it was absolutely terrible at that. I switched over to Cuneiform and it also failed.
can you please make a program that splits an image into a row*col grid and process each cell individually. it should see if the cell is mostly white, mostly black, or apply tesseract OCR on it if it seems to have text in it. The output should be a json file representing the grid. " " for empty, "#" for black, number if it contained a number
This tool is actually great, it managed to convert the picture to:
but because I was having trouble with the OCR, I had it save every cell it split as an individual PNG for me to investigate. I tried tesseract on the individual cells and I improved performance by cropping the black border. So I asked ChatGPT to update the script to do this. And then again with expanding the cropped image with some white borders. Still I couldn't get 100% accuracy on the OCR.
Here's an example prompt I used:
add a 2 px white border to a from PIL import Image
Another I considered was cropping all the whitespace around the numbers away:
python PIL Image. i want to crop the white parts of the sides away
This gave a completely broken program. I didn't bother trying to reprompt or explain the bug to it, didn't pursue this idea.
Here are some OCR test cases:
And I tried LLaVa randomly, it didn't get it either:
On the way I had a couple bash scripts generated for me, like write a bash script to split a dictionary file into a set of files 1.txt 2.txt 3.txt etc collecting up words of each length
. This was very helpful.
I worked with ChatGPT to generate a small library of functions for dealing with a crossword grid in python. I made an example of a nested loop for trying out clues and backtracking on failue, I asked ChatGPT to transform it into a recursive procedure - it got very close, just a couple lines wrong.
Here is a prompt I used to generate the preprocessing function that extracts every clue number from our python list-of-lists crossword representation, building up a map of their coordinates:
python. i have a 2d array representing a crossword grid.
" " represents an empty cell, "#" represents a filled cells, some cells have numbers in them for the clues
want to loop through it and find each number, replace those with an empty cell, but also create a hash
table that maps numbers to row,col coordinates
...
[ChatGPT generated code omitted]
...
that's amazing. nice work. can you also measure how far across and down it is possible to go from each
number
...
[omitted]
...
write a crossword_get and crossword_set function that takes a number, a boolean representing down/across
and a word, it places the word into the grid.
refactor the get/set functions to make use of the number_coordinates
The idea of 'amplifying' ChatGPTs intelligence/capabilities by putting them into a tree search loop seems plausible but didn't pan out in this case. What actually ended up being most effective was just asking it for a few guesses and pasting the whole set of clues in. I asked it to transform that into a JSON object for me to stick into my program. I used a full wordlist for the unknowns (the ones where ChatGPTs guesses were rubbish or clearly didn't include the answer). And that program was able to solve about 80% of the crossword.
Then I just finished it manually with a little help from GPT-4.
Let me show the code I input to ChatGPT:
for word in words_len6:
modified_grid = copy.deepcopy(original_grid)
if crosswordlib.crossword_set(modified_grid, 3, False, word, number_coordinates):
clear_terminal()
print(":{}".format(word))
for row in modified_grid:
print(" ".join(row))
snooze()
for word in words_len4:
sub_modified_grid = copy.deepcopy(modified_grid)
if crosswordlib.crossword_set(sub_modified_grid, 2, True, word, number_coordinates):
clear_terminal()
for row in sub_modified_grid:
print(" ".join(row))
snooze()
and then I asked it to transform that into a recursive function:
so this processes just 2 crossword clues. We want to generalize this to process any length.
Can you write function that takes in an array like [(words_len6, 3, False), (words_len4, 32, True)] and processes them. support any length of array, not just length 2.
Here is what it came up with:
import copy
def process_crossword_clues(clue_list, original_grid, number_coordinates):
modified_grid = copy.deepcopy(original_grid)
for clue in clue_list:
words_list, start_position, is_horizontal = clue
for word in words_list:
sub_modified_grid = copy.deepcopy(modified_grid)
if crosswordlib.crossword_set(sub_modified_grid, start_position, is_horizontal, word, number_coordinates):
clear_terminal()
for row in sub_modified_grid:
print(" ".join(row))
snooze()
This was really only 2 lines off of the final working code that I used to use backtracking search to fill the clues in! Pretty impressive even though it wasn't perfect.
Overall this was a failure in terms of writing a fully automatic crossword solver. I basically solved the crossword manually, but using ChatGPT and GPT-4 to cheat. I don't know how I could automatically solve crosswords with this tool. Maybe a character based LLM could be used for my original idea - I don't know if there is one out there though.
It was also a failure in terms of OCR. I can't believe that I couldn't parse digits out of these little PNG images. Very disappointing.
The thing that I would say was a success in this project was pair programming with ChatGPT. It was very capable at understanding and implementing my instructions to its programming ability level - which is somewhat weak. It is fantastic that you can have it tweak and refactor code just by describing changes you want done. It can write some things correctly. Other things it gets wrong. You need to fix bugs in the code and help it along. But it is miraculous to me that it is able to do what it does. I have never seen any program synthesis tool anything like this before - and it's not just program synthesis, it's much more general.
I've included the code that we worked on together, ChatGPT wrote about 80% of it or more.
Hey! I am an author for the paper "Language Models are Crossword Solvers" (link). We have been able to solve crosswords with close to 90% accuracy. You can find the repository here - https://github.com/espressoVi/LMCrossword. Let me know if this helps or if you have any questions.