In an effort to improve my coding knowledge and skills, I have been programming a Scrabble game in Lua, all very experimental. In fact, I just spent the last 2 days redoing the back end data structure and discovered what method colons are really about!
I have my DAWG code working and it’s able to test words and find anagrams very quickly. Unless there’s blanks, it’s pretty much instant. However, I am building the graph from scratch at runtime (I wish the user to be able to edit the dictionary! But that’s a future issue…)
The word list lives as a huge comment on its own tab (~290k words). Easier to code and same access speed as a file (i.e. I think it’s all syntactic sugar). I have a loop that reads the file and greps each word and sends it to be added to the dictionary.
function Dawg:makeDawg() local words = readProjectTab(dict) for word in string.gmatch(words,"%a+") do self:addWord(word) end end
How can I hack this into a coroutine so my game doesn’t freeze up for 10 seconds? A few seconds pause is okay, but… All attempts at optimization have yielded… wait for it… ZERO speed gain! The Lua JIT compiler is just that good! The readProjectTab command time usage is negligible, as is adding words to the trie. All of the time is actually spent pruning the trie. I’d skip pruning, but that’s a difference between using 25Mb versus hogging 250Mb.
I can use a regular string.match() and keep a position pointer, but is there a more elegant solution? I need to process about 400+ words per frame. Or, is this THE solution?
As a side note, when compiled as a native app, how does Codea respond to iOS requests to dump memory?
@syntonica I’m not sure how your using your ~290K words, but I have a dictionary file of 300,249 words. It takes .8 seconds to read the file and create a table of words. It takes .023 seconds to read each word in the table. If I do a search for words in the format of …a…e… where the . can be any letter, I find 492 words and it takes .32 seconds. I’m doing this on an iPad Air. You should also look into using readText and saveText.
@dave1707 Are you pruning your trie? I’m also using a lowly iPad mini 2, so not the fastest horse in the race! Edit: iPad Air isn’t that much faster than the mini. Which algorithm and data structure are you using?
Not sure what read/saveText are, but I’ll look into them. Thanks!
As an aside, I’ve got using string.match implemented now, ugly code, but I can cream it into a coroutine now.
What do you mean by pruning a trie and why. I put all of my 300,249 words in a table and read the table each time. Using the table is extreamly fast. I can read every word in the table in .023 seconds.
@syntonica If I do a search for every word that has the following letters
str...t, it will find 203 words in .9 seconds. That’s
str anywhere in the word followed by any 3 letters followed by a
t. Can you give me an example of the letters that you use with the 2 blanks. I don’t have anything written to search my list for that but I’ll see what I come up with.
Okay. I’ve hacked in a counter to do 400 iterative before yielding, but it never gets there. It calls addWord and never comes back.
What is up with coroutines?
@dave1707 A trie (short for “reTRIEval”) is a data structure shaped like a tree. Every branch represents a letter in a word and traversing it from the top (called the root!?!) it will spell every word in the dictionary you feed it. Building it is expensive, but I can find every word that can be made with a given seven letters in under .01 seconds. Add a blank and it takes up to 3 seconds. Worst case is seven common letters and both blanks and it takes up to 3 minutes. Verifying it by sending the word list back in takes about .2 seconds.
Using a flat list would take forever for these purposes. Generating a single Scrabble move would take about a day.
@syntonica When you say 7 letters and 2 blanks, are you saying all 9 letters words, or are you saying every words that has those letters from 2 characters to 9 characters. For instance using the 7 letters ( redsact) then some of the words would be (act) (red) (trace) and the 2 blanks could be any other 2 characters.
@dave1707 When I searched for the anagrams of “redsact”, I get 249 words from 2 to 7 letters in well under a second (“REDACTS” and “SCARTED” for the win!)
Add one blank and 4 seconds lasted, I get 390 words.
When I add the two blanks, I get 15,693 different words from 2 to 9 letters, but it took five minutes (I’m using a not-so-quick and dirty recursive anagram algorithm.) Really a bad case.
The algorithm you describe is what I will need to generate moves, but I need to find a better algorithm that prunes impossible words faster, which I think I have lying around here somewhere.
Still doesn’t help me at the front end though. Right now, I just have an alter pop up before the big freeze…
@syntonica I’m writing a routine now that will do what you show. I’ll let you know my times when I get it done.
@syntonica My routine takes forever. I didn’t even let it finish. I’ll have to try a different way.
Duh! Finally got my coroutine to work. All the docs say feed it a function. They didn’t say ANONYMOUS function! Just surround my statement with function() and end and it’s working very well. My splash screen is running at 40fps, so looks okay and my wait alert only pops up for a second now with no freezing, just a tad bit of sluggishness.
@syntonica I changed my search routine and came up with these numbers. A search with
redsact gave me 286 words in .6 seconds. A search with
redsact (1 blank) gave me 3455 words in 1.5 seconds. A search with
redsact (2 blanks) gave me 17642 words in 2.3 seconds. That’s an improvement over my previous search that took forever. I haven’t verified the words, but that would take an awful long time. I tried
a (2 blanks) and the words looked OK.
@syntonica I fixed an error in my code. Here are the new counts and times. Using
redsact I found 286 words in .62 seconds. Using
redsact & 1 blank I found 3,455 words in 1.43 seconds. Using
redsact & 2 blanks I found 16,022 words in 2.35 seconds. That’s searching 300,249 words.
Considering the speed of string matching, I changed my anagraming code to flat file and my numbers now mirror yours. Very bizarre since these sorts of operations used to be very expensive! It’s a trade off. The dawg is much faster with no blanks, but hopelessly slow with two. Even with optimization, I could only get it down to 30 second. Checking for duplicates really slows things down where the flat file method makes no duplicates.
Right now, I’m considering how to use flat file to generate moves. If I go flat file, then it may be possible to sort the word list by word length rather than alpha to keep search times even faster. However, in the end, I think the day will be faster in the end.