Help request: regular expressions... I mean 'patterns'.

Hi.
I would like to write my project loader from a single ‘copy’ from codea, so i need to split the returned string into tab. I would like to use regular expressions for it, but i cant figure out how to do what i want. Can you provide a pice of code that does the job? Here is a sample string:

    local data = [[

--# Main
-- test project loader


function setup()

end


function draw()

end


--# Tab1
Tab1 = class()

function Tab1:init(x)
    -- you can accept and set parameters here
    self.x = x
end

function Tab1:draw()
    -- Codea does not automatically call this method
end

function Tab1:touched(touch)
    -- Codea does not automatically call this method
end

--# Tab2
Tab2 = class()

function Tab2:init(x)
    -- you can accept and set parameters here
    self.x = x
end

function Tab2:draw()
    -- Codea does not automatically call this method
end

function Tab2:touched(touch)
    -- Codea does not automatically call this method
end

    ]]

i would like a split function split(data) that returns: a table:

  • line 1: “Main”.
  • line 2: string of the Main: "-- test project … ".
  • line 3: “Tab1”.
  • line 4: string of tab 1: “Tab1 = class(). …”.
    Etc…

I could write it with string.find and sub, but i would like to see how to do that with matches, captures and reg expressions. I think it is possible, in a couple of elegant lines. Can anyone show me how?

First of all, Lua doesn’t use RegExp, but something called patterns, which aren’t as powerful as RegExp. Anyways, Codea has this functionality natively. If you copy a project (tap and hold then select copy) and hold in the + and press “paste into project”, y get the desired result. For demonstration purposes, I’ll take a shot at it anyways.

Thank you @Zoyt, i know that.
My question is because, while it works fine with small files, there are problems with very big files (cant even copy them from Safarii!) and i want people to load a small program that will load the big one for them…
Anyway my question still holds (with ‘patterns’ instead of ‘reg exp’. ;-). )

Hello @Jmv38. How about (updated: improved)

function setup()
    -- Your data string here...
    data = [[

    ]]

    local first, last = string.find("\
"..data, "\
%-%-#%s")
    if first then
        for s in string.gmatch(string.sub(data, last).."\
--# ", "(.-)\
%-%-#%s") do
            local tabName, tabContents = string.match(s, "(%w+)\
(.*)")
            print("Tab named: "..tabName)
            print("--- Contents begin ---")
            print(tabContents)
            print("--- Contents end ---")
        end
    end
end

function draw() background(0) end

```

That looks great @mpilgrem, except… I cant copy the code! Can you use the ~ format instead of this nice colored one? Thanks

I managed copy by coping the text above too (!).
That works perfectly! You are The Pattern Master. Thanks.
Actually i started reading the lua manual on patterns, their explanations are great, but it would still have been a headache to come out with "
–# ", “(.-)
%-%-#%s” and “(%w+)
(.*)” … :slight_smile:

For those who dont speak the ‘patterns langage’ fluently, here is a tentative traduction:

function split(data)
    -- code from Obi Wan Mpilgrem, 6th dan Master Jedi in LUA Patterns 
    -- traduction by your servant, the young padawan Luke JMV38.
    -- it took me 60 min just to understand what the following 4 lines means...
    -- To protect newbees, i put a ? ech time i saw something brilliant in that code.
    -- gosh! where are my sunglasses?
    -- So here is the traduction for white belts:
    --
    -- In the string 'data' ,
    -- (to which we ? add a new-line character "\
" at the beginning in case there is none),
    -- find the position of the 'first' anf 'last' character position of the 
    -- substring : 'new-line'..'--# ' =  "\
--# " in C  = ? "\
%-%-#%s" in Patterns

    local first, last = string.find("\
"..data, "\
%-%-#%s")

    -- if and only if ? there is such a substring, there is something to do
    if first then
        -- string.sub(data, last) :take the sub part of data that ? starts after first "\
--# "
        -- string.sub(data, last).."\
--# " : and ? add a terminating "\
--# " 
        -- for each time you find in 'data' the smallest [= ? "(.-)"] string that matches :
        --         ......characters...... "\
--# ",
        -- then return the  '......characters......'  in 's', ? without the "\
--# ", and do:

        for s in string.gmatch(string.sub(data, last).."\
--# ", "(.-)\
%-%-#%s") do

            -- return from s 
            --    first: the biggest word you find before the first new line ? "(%w+)\
"
            --            ('word' means the ? spaces are ignored)
            --    second: all the characters you find after this new line ? "\
(.*)"
            --    note that the newline character itself ? is not returned...
            --    first is the tabname, second is the tab contents.

            local tabName, tabContents = string.match(s, "(%w+)\
(.*)")

            -- this part is std:
            print("Tab named: "..tabName)
            print("--- Contents begin ---")
            print(tabContents)
            print("--- Contents end ---")
        end
    end
end

A variation on the theme, using empty captures (()) to locate the delimiting \ --# :

function setup()
    -- Your string here...
    local data = [[
    ]]
    local first
    for c1, c2 in string.gmatch("\
"..data.."\
--# ", "()\
%-%-#%s()") do
        if first then
            local s = string.sub(data, first, c1 - 2)
            local tabName, tabContents = string.match(s, "(%w+)\
(.*)")
            print("Tab named: "..tabName)
            print("--- Contents begin ---")
            print(tabContents)
            print("--- Contents end ---")
        end
        first = c2 - 1
    end
end

```