Dictionary access

Hi all,
I was wondering if there would be a way to access (I.e query ) the IOS dictionary. Say I wanted to do a search for a certain word.

You’ll have to implement your own dictionary. Codea doesn’t have access outside of it’s sandbox in 1.4.6.

Thanks

If you search out the thread for Scram, there should be a link to the dictionaries I used for that game – something like 50k words done in the most brute force fashion. It works. I believe that same thread also has links to some other dictionaries that people have created with much more care.

Thanks, how slow is the search?

More or less instant.

Seems that you could suck in the dictionary from an external http source (or image) and create an indexed table like:

Table[string]=true

Then check if table[string] exists. If it does, your word does.

Yep, that works. It’s just that I found the time to pack the dictionary made for a long start up, while a simple string search of the dictionary as one massive, comma-delimited string, took almost no time.

I think that the discussion that Mark is referring to was a beta discussion so may not be available to general view.

Regarding speed, I did some tests back then. I posted my results in that thread but as it’s hidden I’ll copy it below.


I’ve just run some tests and looking up a key in a table beats string.find quite comprehensibly. My tests are probably not gold standard, though. My basic script runs the checkStringA function 100000 times, once with your checkStringA function and once with loading a file which starts:

function checkStringA(w)
    return words.A[w]
end

words = {}
words.A = {}

words.A["aardvark"] = true
words.A["aardwolf"] = true
-- etc

Then I timed the execution. I should note very clearly that I ran these on my laptop rather than on Codea - my intention was to benchmark lua, not particularly Codea. At 100000 iterations, your string.find was taking a noticeable amount of time: about 9s. The table lookup method was at 0.02s:

tmp% repeat 10 time lua testDict.lua
lua testDict.lua  8.41s user 0.00s system 99% cpu 8.410 total
lua testDict.lua  8.39s user 0.00s system 99% cpu 8.393 total
lua testDict.lua  8.40s user 0.00s system 99% cpu 8.406 total
lua testDict.lua  8.35s user 0.00s system 99% cpu 8.357 total
lua testDict.lua  8.41s user 0.00s system 99% cpu 8.409 total
lua testDict.lua  8.40s user 0.00s system 99% cpu 8.408 total
lua testDict.lua  8.47s user 0.00s system 99% cpu 8.477 total
lua testDict.lua  8.33s user 0.00s system 99% cpu 8.334 total
lua testDict.lua  8.35s user 0.00s system 99% cpu 8.352 total
lua testDict.lua  8.39s user 0.00s system 99% cpu 8.394 total
tmp% repeat 10 time lua testDict.lua
lua testDict.lua  0.02s user 0.00s system 94% cpu 0.026 total
lua testDict.lua  0.02s user 0.00s system 95% cpu 0.021 total
lua testDict.lua  0.02s user 0.00s system 94% cpu 0.021 total
lua testDict.lua  0.02s user 0.00s system 94% cpu 0.021 total
lua testDict.lua  0.02s user 0.00s system 94% cpu 0.021 total
lua testDict.lua  0.02s user 0.00s system 94% cpu 0.021 total
lua testDict.lua  0.02s user 0.00s system 94% cpu 0.021 total
lua testDict.lua  0.02s user 0.00s system 94% cpu 0.021 total
lua testDict.lua  0.02s user 0.00s system 94% cpu 0.021 total
lua testDict.lua  0.02s user 0.00s system 95% cpu 0.021 total

On the other hand, you do seem to end up using about twice the memory (first one is the string method):

tmp% /usr/bin/time -l lua testDict.lua
        0.84 real         0.84 user         0.00 sys
    700416  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       179  page reclaims
tmp% /usr/bin/time -l lua testDict.lua
        0.01 real         0.00 user         0.00 sys
   1560576  maximum resident set size
         0  average shared memory size
         0  average unshared data size
         0  average unshared stack size
       389  page reclaims

So: twice the memory, but about 40 times faster.

So your search was in the above mentioned 50k words dictionary and you performed 100k searches in .02 seconds?

Andrew, I don’t dispute hat table lookups are faster than string finds – they absolutely are. The thing is, an individual string find (which is what I needed for my scrabble-style game) took a negligible amount of time while loading up the tables at the outset if the app made for a quite noticeable start-up lag.

0.00009 seconds is a performance I can live with.

Lose the RAM…speed rules.

As far as startup time, implement a splash screen for the user during the initial load to make that time not ‘feel’ like a delay.

.@Mark Totally agree. Different situations call for different solutions.

I have UK and Norwegian (bokmål) dictionaries lying around somewhere. They get quite big, you need filesystem access to import them to Codea.

@Andrew_Stacey I tried using my .raw to .png process and I was able to load a 3mb dictionary file from my PC to my iPad as a 600x600 .png image and saved it in Dropbox. I read the first 10,000 characters of the .png image on the iPad and created a table/string of 2,901 words. I did a word count without saving anything, and the image contains 105,034 words. I’m not sure if a table or string in Codea can hold that many words. Currently my program creates 1 character per table entry. I have to change it so each table entry is a word.

EDIT: I had no trouble creating the table of words, but apparently I created the .png image wrong on my PC. When I created the table of words, the 105,034 word was ‘hachure’ , so I’m not even half way thru the full word list. I have to see what I did wrong on the PC.

EDIT: Apparently I took the square root of the wrong number when I created the .raw file on my PC. The correct size of the .png image is now 1025x1025. The word count is 300,249. I read the word table moving each word into a string and as far as I could tell, it took a second or less. It takes 17 seconds to read the .png image and create the 300,249 word table.