String.dump followed by load not working

Hi @John @sim @jfperusse,

I’ve been attempting to use Lua’s string.dump function but when I then attempt to load the generated binary string it fails with even the simplest of functions (same with an empty function):

function setup()
    local testFn = function()
        return "Hello Lua!"
    end
    local dumped = string.dump(testFn)
    local fn, err = load(dumped)
    if fn == nil then
        error(err, -1)
    end
    print(fn())
end

In this case error:

binary string: bad binary format (integer overflow)
stack traceback:
	[C]: in function 'error'
	Documents/Test/Main.lua:10: in function 'setup'

I have no idea if this is a problem with Lua itself or Codea at this point but will continue to investigate.

Cheers,
Steppers

I found some code that uses string.dump. Here’s how I use it. Then load is like you have.

    s2=string.dump(functionToDump,true)

@dave1707 I tried stripping the debug data too but I see exactly the same error. I’ve used this before without issue so I expect something must have broken in the last 2 years or so.

I have a ‘Packager’ project which compiles other projects to bytecode that was known to be working before but is now also broken.

I’ll be checking this in plain old Lua on Linux shortly to hopefully eliminate that possibility. If it’s there I’ll report the issue to the Lua devs to get a fix upstream

It turns out that I’m just after the bytecode that’s created from the dump function. Then I format the code into a dump showing the opcodes of the lua code.

I’ve now tried the above code on all Lua 5.4 versions (5.4.0 - 5.4.6) on Linux and none of them exhibit this issue so this is definitely on the Codea side or Codea’s Lua modifications (if any?).

@sim @John The following code works in Codea 3.1 (208). Prints the string qwertyuiop. Doesn’t work in the current Codea version.

function setup()
    s1=string.dump(xx)
    load(s1)()
end

function xx() 
    print("qwertyuiop")
end
1 Like

@sim @John On very close inspection of the dumped bytecode I think I’ve found the issue (I spent a while looking at Lua’s ldump.c).

It seems Codea is adding an erroneous additional ‘constants’ chunk to the bytecode for some reason? The bytes I removed are highlighted (by hand) in red on the left:

Hey everyone.

I can confirm the issue is on our side and cause by our “halt” optimization for the Air Code debugger. I am not sure yet why this “patch” that we apply to Lua has to modify the dump function, but we can probably remove this part.

We’ll keep you posted.

Update:

Actually, the patch was incorrectly applied for the dump method, and the code was being dumped twice, which is why you see that “H G” two times in the above screenshot.

Thanks a lot for the details, I’ll include a fix in the next update.

1 Like

@jfperusse Awesome! Thanks for looking into that so quickly! I’ll keep an eye out for the next update.

I’m glad all the detail helped :sweat_smile:

2 Likes

@Steppers, @dave1707 Beta build 407 includes the fix. Thank you!

@jfperusse Tried the string.dump and load code and it works ok.

2 Likes

Perfect, I’ll check it out. Thank you!

2 Likes

@Steppers @jfperusse Do either one of you have a file layout for Lua 5.4 Bytecode. I wrote a Codea program awhile back that dumps bytecode using version 5.1 . Codea now uses version 5.4, so my program doesn’t dump the file correctly. I tried searching for the layout but I’m not finding anything. My program uses the string.dump function and I parse the string and format the code into a readable dump and also to display the opcodes.

https://the-ravi-programming-language.readthedocs.io/en/latest/lua_bytecode_reference.html#instruction-notation

This was the best reference I could find but for the most part I think your best bet will be inspecting the 5.4 source code. I think the 32bit instructions are largely the same as 5.3 but I expect the dump layout (headers etc.) has probably changed a little since 5.1.

1 Like

@dave1707 I also have this as part of an experimental project which could prove useful for extracting the data in 5.4 (the undump(bin) function is the entry point):

local undumpFunction -- forward declaration

local function undumpByte(D)
    return D:undump("B")
end

local function undumpInteger(D)
    return D:undump("j")
end

local function undumpNumber(D)
    return D:undump("n")
end

local function undumpLiteral(D, len)
    return D:undump("c" .. len)
end

local function undumpSize(D)
    local v = 0
    repeat
        local b = D:undump("B")
        v = (v << 7) | (b & 0x7F)
    until (b & 0x80) > 0
    return v
end

local undumpInt = undumpSize

local function undumpString(D)
    local len = undumpSize(D)
    if len == 0 then
        return ""
    else
        return undumpLiteral(D, len-1)
    end
end

local function undumpCode(D)
    local size = undumpInt(D)
    
    local code = {}
    for i=1,size do
        code[i] = D:undump("I4")
    end
    
    return code
end

local function undumpConstants(D)
    local size = undumpInt(D)
    
    local constants = {}
    for i=1,size do
        local t = undumpByte(D)
        
        local switch = {
            [0x00] = function() -- nil
                return nil
            end,
            [0x01] = function() -- false
                return false
            end,
            [0x11] = function() -- true
                return true
            end,
            [0x03] = function() -- int
                return undumpInteger(D)
            end,
            [0x13] = function() -- float
                return undumpNumber(D)
            end,
            [0x04] = function() -- short string
                return undumpString(D)
            end,
            [0x14] = function() -- long string
                return undumpString(D)
            end
        }
        constants[i] = switch[t]()
    end
    
    return constants
end

local function undumpProtos(D)
    local size = undumpInt(D)
    
    local protos = {}
    for i=1,size do
        protos[i] = undumpFunction(D)
    end
    
    return protos
end

local function undumpUpvalues(D)
    local size = undumpInt(D)
    
    local upvalues = {}
    for i=1,size do
        upvalues[i] = {
            instack = undumpByte(D),
            idx = undumpByte(D),
            kind = undumpByte(D)
        }
    end
    
    return upvalues
end

local function undumpDebug(D, f)
    local sizelineinfo = undumpInt(D)
    if sizelineinfo == 0 then
        f.lineinfo = ""
    else
        f.lineinfo = undumpLiteral(D, sizelineinfo)
    end

    local sizeabslineinfo = undumpInt(D)
    f.abslineinfo = {}
    for i=1,sizeabslineinfo do
        f.abslineinfo[i] = {
            ["pc"] = undumpInt(D),
            ["line"] = undumpInt(D)
        }
    end
    
    local sizelocvars = undumpInt(D)
    f.locvars = {}
    for i=1,sizelocvars do
        f.locvars[i] = {
            varname = undumpString(D),
            startpc = undumpInt(D),
            endpc = undumpInt(D)
        }
    end
    
    local sizeupvalues = undumpInt(D)
    for i=1,sizeupvalues do
        f.upvalues[i].name = undumpString(D)
    end
end

undumpFunction = function(D)
    local f = {}
    
    f.source = undumpString(D)
    f.linedefined = undumpInt(D)
    f.lastlinedefined = undumpInt(D)
    f.numparams = undumpByte(D)
    f.is_vararg = undumpByte(D)
    f.maxstacksize = undumpByte(D)
    f.code = undumpCode(D)
    f.constants = undumpConstants(D)
    f.upvalues = undumpUpvalues(D)
    f.protos = undumpProtos(D)
    undumpDebug(D, f)
    
    return f
end

local function undumpHeader(D)
    assert(undumpLiteral(D, 4) == "\x1bLua") -- signature
    assert(undumpByte(D) == 0x54) -- version
    assert(undumpByte(D) == 0x00) -- official?
    assert(undumpLiteral(D, 6) == "\x19\x93\r\n\x1a\n") -- random data?
    assert(undumpByte(D) == 4) -- instruction size
    assert(undumpByte(D) == 8) -- integer size
    assert(undumpByte(D) == 8) -- float size
    assert(undumpInteger(D) == 0x5678) -- integer value
    assert(undumpNumber(D) == 370.5) -- float value
end

-- Undumps a binary code blob to a useful structure
function undump(bin)
    local D = {
        i = 0,
        undump = function(self, fmt)
            local v, i = string.unpack(fmt, bin, self.i)
            self.i = i
            return v
        end
    }
    
    undumpHeader(D)
    undumpByte(D) -- numupvalues
    return undumpFunction(D)
end

This is mostly a re-implementation of the undump file in the Lua 5.4 source code.

1 Like

@Steppers Looking at the dumpheader function of you pr code, it doesn’t look like it matches 5.4 bytecode. Here’s code that dumps the bytecode. I can figure out some things, but an explanation of what bytes are what would make things a lot easier.

viewer.mode=FULLSCREEN

function setup()
    str=string.dump(xx)
end

function xx()
    a=3
    b=4  
    c=5 
    d=6 
end

function draw()
    background(0)
    y=HEIGHT-100
    x=0
    cnt=0
    for z=1,#str do
        v=str:byte(z)
        v1=str:sub(z,z)
        text(string.format("%02x",v),200+x*25,y)
        text(string.format("%s",v1),200+x*25,y+18)
        x=x+1
        if x>15 then
            y=y-50
            x=0
        end
    end 
end

@dave1707 In what way does it not match?

You should only be calling my code like this:

function setup()
    local bytecode = string.dump(function(n) return n*2 end)
    local usableDump = undump(bytecode)
    
    -- usableDump.code is now a table of all the 32bit instructions.
    -- They will need further processing to figure out what they are though.
end

@Steppers Here’s how I’m trying to use your code but I get nothing. I’m trying to use the bytecode from function xx(). I show the bytecode dump for it a few posts up.

What I’m really after is something like below, a file layout, where it describes what each group of bytes is used for.

Byte 1 something
Byte 2-4 Lua
Byte 5 version number
Byte 6-? something
Byte ?-? something
Byte ?-? something

Here’s a link to what I’m after, but for version 5.4 not versions 5.1 -5.3.

(lua_bytecode.md · GitHub)




function xx()
    a=3
    b=4  
    c=5 
    d=6 
end
    
function setup()
    --local bytecode = string.dump(function(n) return n*2 end)
    local bytecode=string.dump(xx)
    local usableDump = undump(bytecode)
    
    -- usableDump.code is now a table of all the 32bit instructions.
    -- They will need further processing to figure out what they are though.
end

@Steppers I still couldn’t get your code to output anything. The variable D, which is a table has a size of 0. I added some code so I could use some info from it. In the function undump(bin) I create a table of the format codes. I do a hex dump of the bytecode file and if you triple tap the screen, I use the format code table I created from your code to format the bytecode information. I’m still looking for a 5.4 file format that explains what each byte or group of byte represent.


viewer.mode=FULLSCREEN

function setup()
    fill(255)
    showHexDump=true
    tab={1}
    dx,dy=0,0
    textMode(LEFT)
    font("Courier-Oblique")
    
    bytecode = string.dump(function(n) 
        a=1
        b=2
        c=3
        d=a+b+c
        for z=a,b do
            print(z)
        end

    return n*2 end)
    
    local usableDump = undump(bytecode)
end

function draw()
    background(0)
    if showHexDump then
        hexDump(bytecode,1,#bytecode)
    else
        dumpTab()
    end
end

local undumpFunction -- forward declaration

local function undumpByte(D)
    return D:undump("B")
end

local function undumpInteger(D)
    return D:undump("j")
end

local function undumpNumber(D)
    return D:undump("n")
end

local function undumpLiteral(D, len)
    return D:undump("c" .. len)
end

local function undumpSize(D)
    local v = 0
    repeat
        local b = D:undump("B")
        v = (v << 7) | (b & 0x7F)
    until (b & 0x80) > 0
    return v
end

local undumpInt = undumpSize

local function undumpString(D)
    local len = undumpSize(D)
    if len == 0 then
        return ""
    else
        return undumpLiteral(D, len-1)
    end
end

local function undumpCode(D)
    local size = undumpInt(D)
    
    local code = {}
    for i=1,size do
        code[i] = D:undump("I4")
    end
    
    return code
end

local function undumpConstants(D)
    local size = undumpInt(D)
    
    local constants = {}
    for i=1,size do
        local t = undumpByte(D)
        
        local switch = {
            [0x00] = function() -- nil
                return nil
            end,
            [0x01] = function() -- false
                return false
            end,
            [0x11] = function() -- true
                return true
            end,
            [0x03] = function() -- int
                return undumpInteger(D)
            end,
            [0x13] = function() -- float
                return undumpNumber(D)
            end,
            [0x04] = function() -- short string
                return undumpString(D)
            end,
            [0x14] = function() -- long string
                return undumpString(D)
            end
        }
        constants[i] = switch[t]()
    end
    
    return constants
end

local function undumpProtos(D)
    local size = undumpInt(D)
    
    local protos = {}
    for i=1,size do
        protos[i] = undumpFunction(D)
    end
    
    return protos
end

local function undumpUpvalues(D)
    local size = undumpInt(D)
    
    local upvalues = {}
    for i=1,size do
        upvalues[i] = {
            instack = undumpByte(D),
            idx = undumpByte(D),
            kind = undumpByte(D)
        }
    end
    
    return upvalues
end

local function undumpDebug(D, f)
    local sizelineinfo = undumpInt(D)
    if sizelineinfo == 0 then
        f.lineinfo = ""
    else
        f.lineinfo = undumpLiteral(D, sizelineinfo)
    end
    
    local sizeabslineinfo = undumpInt(D)
    f.abslineinfo = {}
    for i=1,sizeabslineinfo do
        f.abslineinfo[i] = {
            ["pc"] = undumpInt(D),
            ["line"] = undumpInt(D)
        }
    end
    
    local sizelocvars = undumpInt(D)
    f.locvars = {}
    for i=1,sizelocvars do
        f.locvars[i] = {
            varname = undumpString(D),
            startpc = undumpInt(D),
            endpc = undumpInt(D)
        }
    end
    
    local sizeupvalues = undumpInt(D)
    for i=1,sizeupvalues do
        f.upvalues[i].name = undumpString(D)
    end
end

undumpFunction = function(D)
    local f = {}
    
    f.source = undumpString(D)
    f.linedefined = undumpInt(D)
    f.lastlinedefined = undumpInt(D)
    f.numparams = undumpByte(D)
    f.is_vararg = undumpByte(D)
    f.maxstacksize = undumpByte(D)
    f.code = undumpCode(D)
    f.constants = undumpConstants(D)
    f.upvalues = undumpUpvalues(D)
    f.protos = undumpProtos(D)
    undumpDebug(D, f)
    return f
end

local function undumpHeader(D)
    assert(undumpLiteral(D, 4) == "\x1bLua") -- signature
    assert(undumpByte(D) == 0x54) -- version
    assert(undumpByte(D) == 0x00) -- official?
    assert(undumpLiteral(D, 6) == "\x19\x93\r\n\x1a\n") -- random data?
    assert(undumpByte(D) == 4) -- instruction size
    assert(undumpByte(D) == 8) -- integer size
    assert(undumpByte(D) == 8) -- float size
    assert(undumpInteger(D) == 0x5678) -- integer value
    assert(undumpNumber(D) == 370.5) -- float value
end

-- Undumps a binary code blob to a useful structure
function undump(bin)
    local D = {
        i = 0,
        undump = function(self, fmt)
            local v, i = string.unpack(fmt, bin, self.i)
            self.i = i
            table.insert(tab,i) -- added table for format values
            return v
        end
    }
    undumpHeader(D)
    undumpByte(D) -- numupvalues
    return undumpFunction(D)    
end

function hexDump(str,st,en)
    fill(255)
    local x,y,v=0,1,0
    for z=st,en do
        x=x+1 
        if x>10 then
            x=1
            y=y+1
        end
        v=str:byte(z,z)
        if v>31 and v<127 then
            text(string.format("%02x-%s ",v,str:sub(z,z)),100+x*50,HEIGHT-20*y)
        else
            text(string.format("%02x-",v),100+x*50,HEIGHT-20*y)
        end
    end
end

function dumpTab()
    local y=2,s,e,v1
    for z=1,#tab-1 do
        s=tab[z]
        e=tab[z+1]
        str=""
        y=y+1
        text(string.format("%3d - %-3d",s,e-1),50+dx,HEIGHT-20*y+dy)
        for a=s,e-1 do
            v1=bytecode:byte(a,a)
            if v1>31 and v1<127 then
                str=str..string.format("%02x-%s ",v1,bytecode:sub(a,a))
            else
                str=str..string.format("%02x ",v1)
            end
        end
        text(string.format("%s",str),180+dx,HEIGHT-20*y+dy)
    end
end

function touched(t)
    if t.state==BEGAN and t.tapCount==3 then
        showHexDump=not showHexDump
    end
    if t.state==CHANGED then
        dx=dx+t.deltaX
        if dx>5 then
            dx=5
        end
        dy=dy+t.deltaY
        if dy<0 then
            dy=0
        end
    end
end