Getting a certain 'line' from a string [code-check]

hey all, i wanted to make a function, that would get a certain line of a string, for example if you have a string

‘hello
my
name
is steven’

then there would be 4 ‘lines’ in the string, so i made a function that could ‘extract’ each of the ‘lines’

So here’s the code, know I wanted to know what you guys think about this function…
Is there a more optimised way? A way that might be faster with larger strings…?

(yes this is just in the setup function, easy testing, lazy to put it in a seperate class for you xd)

function setup()
    str = 'hehe\
lol\
haha\
ai'
    print(str)
    
    offset = 0
    y= 4
    
    for i=1, y do
        
        if i == y-1 then
            extract = string.sub(str, (string.find(str,'\
',offset) or -2) + 1)
        end
        if y == 1 then
            extract = str
        end
        
        offset = (string.find(str,'\
',offset) or -2) + 2
        
    end
    
    extract = string.sub(extract, 0, ((string.find(extract,'\
',0) or (string.len(extract)+1)) -1))
    print(extract)
end

```


this is a working example, change the 'y' to the line-number you want

Thx in advance guys

[btw, i didn't use CC for this because it's only a small example... once i finish this more (no not only this project, i am making something else ;) ) ]

Here comes a hard part of Lua… String matching. Very confusing for me, but I finally found how to do it from multiple Google searches.

The main thing I’ll be covering here is string.match. Depending on the string you supply, it can return multiple variables.

--                                        String to extract from                Pattern to match
local a, b = string.match("blah some text blah 47 blah", "blah (.-) blah (d-) blah") -- Returns  "some text", 47

Complicated? Yes.

The first string, to extract from, can be anything you want. The pattern is the complicated part. First is “blah”, as you can see in the first string, then (.-). What’s this? It’s the first capture. You put parenthesis around something you want to be returned as a value. . means a pattern. The - means shortest match. * means longest match. Say you had blah text blah text blah. If you searched blah (.-) blah, it would return text. If it was blah (.*) blah, it would return text blah text. Since there are two spots where it contains some blah, some text, and then blah again, - or * matters. Since - is shortest, the first time it finds an end to the pattern, it stops there. * would keep on going until it finds a point where the pattern doesn’t match.

Since it’s looking for blah (.-) blah, it finds blah text blah, which matches. text qualifies as .-. Since it’s in parenthesis, it means it should be returned as a variable from string.match().

Then we have (d-). d is a number, so it looks for a number. Again, it’s in parenthesis and should be returned as a variable. And again, it uses the - suffix to mean the shortest match.

If you’re using a non-alphabet character, such as ( or ), you should put a % before it so the function knows you mean it as a character, not as a pattern to match. And if you want to say % as part of the string? Use %% instead.

If you use % on an alphabetical character, it works kind of like a pattern match, but doesn’t return a variable. Like, in the string pattern to match (the second argument of string.match), if you used %d- it would mean “any number can be in the place of %d-, I’m not sure which, any is fine, just only replace it with a number, and don’t return it as a variable.”

Once you have all that down, it’s pretty much the same, except for the different kinds of captures you can use instead of . or d. As far as I know:

. is all characters
d is a decimal digit
s is a space character (" ", not sure what else)
x is a hexidecimal digit (hex color)
u is an uppercase letter
a is any letters
c is control characters (???)
l is a lowercase letter
w is an alphanumeric character (???)
z is a character with a "representation of zero" (???)
f - unsure, called a "frontier" pattern, more info here: http://lua-users.org/wiki/FrontierPattern
bxy - unsure, something about parenthesis, I think

More info I’ve found:

http://stackoverflow.com/questions/2693334/lua-pattern-matching-vs-regular-expressions

http://www.lua.org/pil/20.2.html

Also, there’s string.gmatch, which is close to string.match, but is an iterator:

for (variables) in string.gmatch(stringtosearch, patterntomatch) do
    ...
end

It just goes through the string to search like string.match, but won’t stop the first time, just keep going through the string until it finds the end, each time it matches the pattern calling the code in the for loop with the variables you asked it to return.

Sorry about the massive post, this is a massive topic…

wow thx for that information

so i should be using string.gmatch in order to do this the best :wink:
tho there are problems in the beginning, since it doesn’t start with ’
’ and in the end, since my string doesn’t end with ’

But I’ll take a closer look :wink:

function setup()
    str = 'hehe\
lol\
haha\
ai'
    print(str)
 
    offset = 0
    y= 4
 
    for i=1, y do
 
        if i == y-1 then
            extract = string.sub(str, (string.find(str,'\
',offset) or -2) + 1)
        end
        if y == 1 then
            extract = str
        end
 
        offset = (string.find(str,'\
',offset) or -2) + 2
 
    end
 
    extract = string.sub(extract, 0, ((string.find(extract,'\
',0) or (string.len(extract)+1)) -1))
    print(extract)
end

can just be:

function setup()
    str = 'hehe\
lol\
haha\
ai'
    print(str)
    
    a, b, c, d = str:match("(.-)\
(.-)\
(.-)\
(.-)
    print(a, b, c, d)
end

or

function setup()
    str = 'hehe\
lol\
haha\
ai'
    print(str)
    
    str = str .. "\
"
    for i in str:gmatch("(.-)\
") do
        print(i)
    end
end

or (almost above code, tiny tweak on iterator, one line shorter)

function setup()
    str = 'hehe\
lol\
haha\
ai'
    print(str)
    
    for i in string.gmatch(str .. "\
", "(.-)\
") do
        print(i)
    end
end

The bottom two have support for any amount of \ s you want.

omg ok i totally have no brains in that at all xD

But thanks for the help :smiley: that’ll run alot faster with large string than my method xd

Honestly, I barely understand the basics, and have no idea how to go about advanced captures…

well thanks anyway… you just helped me out alot…

at least you get it more than i do xd

and most of my other code of my complete project is still shitty xd

but i’ll get there, not in the most optimised way, but I’ll get there :wink:

One more thing I forgot to mention: If you wanted to capture, say, anything that’s not an alphabetical character, you capitalize it, in that case it would be %A. Like, %D means anything that’s not a number.

ok, thx :wink:

Very good example of useful capture formula @SkyTheOder, thanks!