String permutations

@LoopSpace I tried converting words and letters to numbers, but I didn’t see how that would work when I tried match anything. The next thing I tried was sorting the letters in each word and having a dictionary table of words with their sorted letters with each word. It took almost 5 seconds to read the dictionary file and create a table of words and their sorted letters. But on the bright side, instead of taking 1.7 seconds to find the 290 matching words, it took only .3 seconds. The thing is, I don’t think the 1.7 seconds to find matching words is that bad. You have to take into consideration that each time you pick new letters, you have to enter those letters in a input field, tap a button to find matching words, then look thru the list of matched words to determine what word you want to use.

@dave1707 To convert to numbers, you have to choose a prime for each letter and then multiply the numbers together. So cab = 5 x 2 x 3 = 30. Then t is a substring of s if s%t == 0.

Re speed: what about when the computer takes a turn? Then you don’t want the user waiting around doing nothing.

@LoopSpace I’ll play around with what you show above later, but here’s what I think is a good routine. You don’t need to do any sorting at all. You just create the dictionary as a table of words. Using the string.sub function and both the words and letters as strings, you just match a character in one string to a character in the other string. If there’s a match, you just replace the matched character with a ! so it’s not used again. Using this routine on my 300,000+ word dictionary file, it took only .4 seconds to find the 290 words that took 1.7 seconds with the other routine. So you create the dictionary file once and then change the letters with parameter.text and press parameter.find .

function setup()
    parameter.text("letters","ivnetxovfsne")
    parameter.action("find",find)
    found={}
    dictionary={"one","two","three","four","five","six","seven","eight","nine","ten"}
    for a,b in pairs(dictionary) do
        find(b,letters)
    end
    print("letters = "..letters)
    print("matched words")
    print(table.concat(found,"\
"))
end

function find(word,letter)
    for z=1,#word do
        c=string.sub(word,z,z)
        letter,n=string.gsub(letter,c,"!",1)
        if n==0 then
            return
        end
    end  
    table.insert(found,word)
end

@LoopSpace Here’s an example of converting words to a number value like you suggested. Using the find routine on my 300,000+ dictionary took .8 seconds to find the 290 words. That was 2x longer then the time using string.gsub. I guess the number for each word could be calculated in advance to speed things up.

function setup()
    val={a=2,b=3,c=5,d=7,e=11,f=13,g=17,h=19,i=23,j=29,k=31,l=37,m=41,n=43,
            o=47,p=53,q=59,r=61,s=67,t=71,u=73,v=79,w=83,x=89,y=97,z=101}
    
    -- calc value of letters
    letters="ivnetxovfsne"
    lettersVal=1
    for z=1,#letters do
        c=string.sub(letters,z,z)
        lettersVal=lettersVal*val[c]
    end
    
    found={}
    
    dictionary={"one","two","three","four","five","six","seven","eight","nine","ten"}
    for a,b in pairs(dictionary) do
        find(lettersVal,b)
    end
    
    print("letters = "..letters)
    print("matched words")
    print(table.concat(found,"\
"))
end

function find(letter,word)
    local wv=1
    for z=1,#word do
        wv=wv*val[string.sub(word,z,z)]    -- calc value of word
    end
    if letter%wv==0 then
        table.insert(found,word)    
    end
end

Interesting, but I would expect a significant speed-up by precalculating. I’ll have to load in a dictionary to test these … I’m getting intrigued.

@LoopSpace I tried the precalculate on each word and got a run time of .18 seconds. Quite a speed up.

@dave1707 I’m trying a hash table lookup first. How are you calculating the times? I want to be sure I’m comparing like with like.

@LoopSpace

    st=os.clock()
    ..... code to time
    en=os.clock()
    print(“time ”,en-st)

This code compares the primes method with a method based on hashing.

-- Dictionary

function setup()
    local d = readText("Documents:Dictionary")
    local dict = split(d,"[^\
\\r]+")
    primes = {2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97,
    101}
    local nd = 0
    for k,v in ipairs(dict) do
        -- for k=1,100 do
            -- v = dict[k]
        if v:find(" ") == nil then
            nd = nd + 1
        end
    end
    print("Dictionary size: " .. nd)
    ndict = {}
    sdict = {}
    local w,t,e,s,n
    s = os.clock()
    for k,v in ipairs(dict) do
    -- for k=1,100 do
        -- v = dict[k]
        if v:find(" ") == nil then
            t = split(v,".")
            table.sort(t)
            w = table.concat(t)
            if not sdict[w] then
                sdict[w] = {}
            end
            table.insert(sdict[w],v)
        end
    end
    e = os.clock()
    print("Hashing: " .. (e-s))
    
    s = os.clock()
    for k,v in ipairs(dict) do
    -- for k=1,100 do
        -- v = dict[k]
        if v:find(" ") == nil then
            t = split(v,".")
            n = 1
            for l,u in ipairs(t) do
                n = n * primes[u:upper():byte()-64]
            end
            
            if not ndict[n] then
                ndict[n] = {}
            end
            table.insert(ndict[n],v)
        end
    end
    e = os.clock()
    print("Primes: " .. (e-s))
    
    parameter.text("letters","gtexbju")
    parameter.action("doFind",doFind)
    --]]
end


--- splits 'str' into pieces matching 'pat', returns them as an array
function split(str,pat)
  local tbl={}
  str:gsub(pat,function(x) tbl[#tbl+1]=x end)
  return tbl
end

function findHash(l)
    local w = split(l,".")
    table.sort(w)

    local i = {}
    for k=1,#w do
        table.insert(i,0)
    end
    local r = {}
    local j = #w
    local s
    while j > 0 do
        i[j] = i[j] + 1
        if i[j] == 2 then
            i[j] = 0
            j = j - 1
        else
            j = #w
            s = {}
            for l=1,#w do
                if i[l] == 1 then
                    table.insert(s,w[l])
                end
            end
            s = table.concat(s)
            if sdict[s] then
                for k,v in ipairs(sdict[s]) do
                    table.insert(r,v)
                end
            end
        end

    end

    return r
end

function findPrime(l)
    local t = split(l,".")
    local n = 1
    for l,u in ipairs(t) do
        n = n * primes[u:upper():byte()-64]
    end
    local r = {}
    for k,v in pairs(ndict) do
        if n%k == 0 then
            for l,u in ipairs(v) do
                table.insert(r,u)
            end
        end
    end
    return r
end

function doFind()
    local s,t,e
    
    s = os.clock()
    findHash(letters)
    e = os.clock()
    
    t = findHash(letters)
    print("Hashing: ")
    print(#t .. " words found in " .. (e-s) .. "s")
    
    s = os.clock()
    findPrime(letters)
    e = os.clock()
    
    t = findPrime(letters)
    print("Primes: ")
    print(#t .. " words found in " .. (e-s) .. "s")
end

I have a dictionary stored as a text asset. It reports just over 370,000 words.

In terms of initialising, I’m getting about 18s for the hash table and 25s for the primes table. Obviously, this could be stored in the file itself.

When it comes to finding all the words, with 7 letters I get of the order of 0.2s for the primes method, but an amazing 0.001s for the hash method.

I need to double-check my routines. With the letters ivnetxovfsne then the hash method found about 1400 words in less than 0.08s while the prime method only found about 500 words in 0.25s

@LoopSpace Your reusing letters in your hash routine. I put the word one in the dict table. When I used the letters one, hash found 1 word. When I entered onee in letters, hash found 2 words. When I entered oneee in letters, hash found 3 words. It kept reusing the letters on for the three e’s. When you said you were going to use hash for the dictionary, I wasn’t sure how that was going to work. But I took your hash code and used just a few words in the dict table. I printed out sdict to see what was there and I saw what you were doing. You sorted the letters in each word and any words that had the same letters were considered the same word with the actual words in another table. I wonder how much the full dictionary table was reduced.

@LoopSpace I ran your hash routine on my dictionary file. The full dictionary table is 300,130 words. The hash table was 269,725 words. So hash reduced the table entry size by 30,405 words. That’s about 10% smaller. So I think there should be about a 10% reduction in speed.

Edit: I used the letters ivnetxovfsne on my 300,130 words and the hash routine found 1,189 words in .04 sec and the prime routine found 436 words in .08 sec.

Edit 2: With the letters ivnetxovfsne, you have 2 v’s, 2 n’s, 2 e’s. I removed the duplicate letters and got 299 matches with both the hash and prime routines.

Good catch!

Here’s my modified search:

function findHash(l)
    local w = split(l,".")
    table.sort(w)

    local i = {}
    for k=1,#w do
        table.insert(i,0)
    end
    local r = {}
    local j = #w
    local s
    local f = {}
    while j > 0 do
        i[j] = i[j] + 1
        if i[j] == 2 then
            i[j] = 0
            j = j - 1
        else
            j = #w
            s = {}
            for l=1,#w do
                if i[l] == 1 then
                    table.insert(s,w[l])
                end
            end
            s = table.concat(s)
            f[s] = true
        end
    end
    for s,u in pairs(f) do
        if sdict[s] then
            for k,v in ipairs(sdict[s]) do
                table.insert(r,v)
            end
        end
    end

    return r
end

With that, I get the same number of matches for the two routines. With the ive… set, I get 528 in .25s for primes and .073 for hash.

My dictionary also shows about a 10% reduction in size.

Loopspace How do you print the actual words found by your routine

@colincooper One way you can print the words in @LoopSpace code above is add the line print(table.concat(t,"\ ")) after the line of code t=findHash(letters) or t=findPrime(letters) in the function doFind() .

Thank you