Experiments with vec2 userdata

I rewrote some code using vectors, by applying Codea’s vec2 userdata type - and it seemed to run much slower as a result. That is not what I had expected. The code below explores that further:


--
-- Codea's vec2 userdata
--
function setup()
    local n = 100000
    local d
    local v1 = vec2(1, 2)
    local v2 = vec2(4, 5)
    local v1x = v1.x
    local v1y = v1.y
    local v2x = v2.x
    local v2y = v2.y
    local tb1 = {x=1, y=2}
    local tb2 = {x=4, y=5}
    
    print("Vectors - minus and len")
    t1 = os.clock()
    d = 0
    for i = 1, n do
        local v3 = v2 - v1
        d = d + v3:len()
    end
    dt1 = os.clock() - t1
    print("Result:"..d) 
    print(dt1)
    print()
    
    print("Vectors - partial")
    t2 = os.clock()
    d = 0
    for i = 1, n do
        local v3x = v2.x - v1.x
        local v3y = v2.y - v1.y
        d = d + math.sqrt(v3x*v3x + v3y*v3y)
    end
    dt2 = os.clock() - t2
    print("Result:"..d) 
    print(dt2)    
    print("Saving (%):", (1 - dt2/dt1)*100)
    print()
    
    print("Vectors - dist")
    t3 = os.clock()
    d = 0
    for i = 1, n do
        d = d + v2:dist(v1)
    end
    dt3 = os.clock() - t3
    print("Result:"..d) 
    print(dt3)
    print("Saving (%):", (1 - dt3/dt1)*100)
    print()
    
    print("Tables")
    t4 = os.clock()
    d = 0
    for i = 1, n do
        local v3x = tb2.x - tb1.x
        local v3y = tb2.y - tb1.y
        d = d + math.sqrt(v3x*v3x + v3y*v3y)
    end
    dt4 = os.clock() - t4
    print("Result:"..d) 
    print(dt4)    
    print("Saving (%):", (1 - dt4/dt1)*100)
    print()
    
    print("Pure number types")
    t5 = os.clock()
    d = 0
    for i = 1, n do
        local v3x = v2x - v1x
        local v3y = v2y - v1y
        d = d + math.sqrt(v3x*v3x + v3y*v3y)
    end
    dt5 = os.clock() - t5
    print("Result:"..d)
    print(dt5)    
    print("Saving (%):", (1 - dt5/dt1)*100)     
end

function draw()
    background(0)
end

On my iPad2, this gives the following output:


Vectors - minus and len
Result:424760
0.560913

Vectors - partial
Result:424760
0.409302
Saving (%):	27.0294

Vectors - dist
Result:424760
0.261353
Saving (%):	53.4059

Tables
Result:424760
0.111877
Saving (%):	80.0544

Pure number types
Result:424760
0.0709839
Saving (%):	87.3449

It seems that vec2 comes at a price, the cost being speed.

Thanks for doing this, I’ve wondered about the performance implications of using vec2. Since Lua allows multiple return values, I wonder if it would be better to have a set of functions that take and return vectors by their individual components instead. It would probably be a lot less convenient though.

That’s really interesting, I want to make some test too because I was pretty sure (and I wrongly never checked) that vec2 math should be faster than lua math on generic numbers/tables ecc. also due to some discussion on this forum. Probably something that could make this calc faster using vec2 would be the possibility to not create each time a new vec2 obj (that I fear is the real cause performance problem), like having methods that allows to apply the transformations (like rotate, translate, ecc) directly on the same vec2 or on a vec2 passed as parameter. @Simeon what do you think about @mpilgrem results?

Those are very interesting results. I suspect there may be a lot of overhead when constructing a new userdata type, as well as calling out to C. So for the types of simple calculations you’re performing, the overhead outweighs the benefits.

This is the source code for our vec2 implementation (from the Codea Runtime Library): https://github.com/TwoLivesLeft/Codea-Runtime/blob/master/CodeaTemplate/LuaLibs/vec2.c

Perhaps performance would be better if we re-wrote this as a pure-Lua library?

You can vastly improve the performance of the vec2 benchmarks by locally caching the functions. The biggest slowdown is lookups on the vec2 members.

    print("Vectors - minus and len")
    t1 = os.clock()
    d = 0
    local len = v2.len
    local v3 = nil
    for i = 1, n do
        v3 = v2 - v1
        d = d + len(v3)
    end
    dt1 = os.clock() - t1
    print("Result:"..d) 
    print(dt1)
    print()


  print("Vectors - dist")
    t3 = os.clock()
    d = 0
    local dist = v2.dist
    for i = 1, n do
        d = d + dist(v2, v1)
    end
    dt3 = os.clock() - t3
    print("Result:"..d) 
    print(dt3)
    print("Saving (%):", (1 - dt3/dt1)*100)
    print()

This gives me a saving of ~76% on that particular test.

Caching the function is not ideal, though. But for tight loops, this might be necessary for good performance.

This appears to be due to Lua’s luaL_checkudata call, which validates the vec2 type for safety before attempting to perform the operation.

It’s quite a slow call, I’m going to try to find a way to work around this while maintaining a safety check.

Edit: I am able to speed up the built-in vectors so that they are faster than the “Pure number” and table solutions, however Codea could potentially be crashed by passing in an incorrect type (for example, passing a vec2 into a vec4 length function). Unsure whether it would be worth sacrificing stability for speed.

@Simeon maybe add a global setting function vec2check(boolean), set by default to true? When writing and debugging it would be true, and set to false when game is ready?

I have an experimental fix that is still safe to use.

It’s still always going to e faster to cache the method calls in locals prior to entering a tight loop, though. But that’s just the way Lua is.