Anagram permutations and spell checking

Ignatz · March 11, 2014, 2:39am

@West - the code above has no constraints. My original code did, and that can be removed by deleting the second condition in the line below from the GetPuzzle function

--change from
if x~=0 and string.sub(s,1,1)=="q" then table.insert(arr,WordList[i]) end
--to
if x~=0  then table.insert(arr,WordList[i]) end

But this function isn’t optimised for speed, and lists all matches. The code I posted above should be considerably faster.

West · March 11, 2014, 3:22am

Ah got it! Thanks a lot! Would you mind if I used this in my game?

Ignatz · March 11, 2014, 3:25am

@West - of course you can!

West · March 11, 2014, 6:18am

@Ignatz - awesome - thanks!

dave1707 · March 11, 2014, 10:24am

@JakAttak I just did a google search for dictionaries and picked one that I liked. I created a png file with the words and sent it to my ipad. It’s a 1025x1025 png file that I read like a sprite with readImage. It takes 4-5 seconds to uncompress the png file and create a table with 300,249 word entries. I can scan the entire table in less than a second.

JakAttak · March 11, 2014, 6:34pm

@Ignatz, @West, @dave1707, thanks for the info. I will google a dictionary and see if I can find one that includes what I’m looking for.

EDIT: @West, the TWL (or maybe even the SOWPODS if i wish) seem perfect. Do you know if these are licensed / copyrighted in anyway?

Ignatz · March 12, 2014, 3:22am

@West, @dabuek - I tried the approach of creating a number composed of primes for each dictionary word, and seeing if it divides cleanly into the equivalent number made up of the letter string provided… I would have expected it to be faster than my approach above, because once the numbers are calculated, it only has to do one division per word tested. But it seems to be about the same speed. And of course, it takes some time up front to calculate the numbers.

If you want to try it out…

This function calculates the number for each word in a list w, and returns a table with the corresponding numbers. I arranged letters by frequency to keep the numbers smaller and avoid overflow (I haven’t checked if there is any overflow).

function IndexWords(w)
    local p={2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,101}
    chars=string.upper("etaoinsrhldcumfpgwybvkxjqz")
    indexChar={}
    for i=1,string.len(chars) do
        indexChar[string.sub(chars,i,i)]=p[i]
    end
    local n={}
    for i=1,#w do
        local W=w[i]
        local x=1
        for j=1,string.len(W) do
            local e=string.sub(W,j,j)
            x=x*indexChar[e]
        end
        n[i]=x
    end
    print(n[#n])
    return n
end

The function below finds the longest word matching a string of letters str, and prints it (as before, it assumes precalculation of a table LengthChange which contains the word count at the point where the length of the words increases (assuming they are sorted short to long).

function DoIndexed()
    local str=string.upper(Letters)
    if u<2 then print("too few letters") return end
    if u>=9 then j=#WordList else j=LengthChange[u+1] end
    local x=1
    for j=1,string.len(str) do
        local e=string.sub(str,j,j)
        x=x*indexChar[e]
    end
    for i=j,1,-1 do
        if math.fmod(x,index[i])==0 then print(WordList[i]) return end
    end
    print( "No words found")
end

dabuek · March 12, 2014, 7:53am

@Ignatz if I understand your code correctly, it’s not quite what I was thinking of. I was thinking of using the calculated product as the index to your word list. E.g. table[P(word)] = [word, anagram1, anagram2, anagram3, …] (or some ordering of that). It might be more memory-efficient to could store indices into your original word list.

Admittedly I hadn’t considered the case where you were looking for words shorter than your original string, but I’m sure you can do better than searching backwords through the entire word list.

There are 511 unique subsets that would need considering (fewer if there are repeated letters). 511 lookups in the worst-case seems pretty good although you need to construct each of them which may be time-consuming.

(The 511 number comes summing (9 choose k) for k=1…9)

Ignatz · March 12, 2014, 8:00am

@dabuek - I don’t follow the case for 511 subsets. Are you talking about some sort of hash table?

Anyway, the current search is so fast, I think it’s adequate.

dabuek · March 12, 2014, 10:02am

@Ignatz I realise this is largely academic, since you have a solution that works fast enough. That being said:

What I was saying is there are 511 ways of choosing a subset of the 9 characters in the list which you can then test against your table of prime products.

I.e. there is only way of choosing 9 characters, there are 9 ways of choosing 8 characters, 36 ways of choosing 7, 84 ways of choosing 6, 126 of choosing 5, 126 of choosing 4, 84 of choosing 3, 36 of choosing 2 and 9 of choosing 1. If you sum them all up you get 511.

The optimisation I was thinking of would involving calculating the product for each of the the 511 cases and testing that against your lookup table.

Ignatz · March 12, 2014, 10:12am

@dabuek - ok, I got it, that sounds a good method