I rewrote some code using vectors, by applying Codea’s vec2 userdata type - and it seemed to run much slower as a result. That is not what I had expected. The code below explores that further:
--
-- Codea's vec2 userdata
--
function setup()
local n = 100000
local d
local v1 = vec2(1, 2)
local v2 = vec2(4, 5)
local v1x = v1.x
local v1y = v1.y
local v2x = v2.x
local v2y = v2.y
local tb1 = {x=1, y=2}
local tb2 = {x=4, y=5}
print("Vectors - minus and len")
t1 = os.clock()
d = 0
for i = 1, n do
local v3 = v2 - v1
d = d + v3:len()
end
dt1 = os.clock() - t1
print("Result:"..d)
print(dt1)
print()
print("Vectors - partial")
t2 = os.clock()
d = 0
for i = 1, n do
local v3x = v2.x - v1.x
local v3y = v2.y - v1.y
d = d + math.sqrt(v3x*v3x + v3y*v3y)
end
dt2 = os.clock() - t2
print("Result:"..d)
print(dt2)
print("Saving (%):", (1 - dt2/dt1)*100)
print()
print("Vectors - dist")
t3 = os.clock()
d = 0
for i = 1, n do
d = d + v2:dist(v1)
end
dt3 = os.clock() - t3
print("Result:"..d)
print(dt3)
print("Saving (%):", (1 - dt3/dt1)*100)
print()
print("Tables")
t4 = os.clock()
d = 0
for i = 1, n do
local v3x = tb2.x - tb1.x
local v3y = tb2.y - tb1.y
d = d + math.sqrt(v3x*v3x + v3y*v3y)
end
dt4 = os.clock() - t4
print("Result:"..d)
print(dt4)
print("Saving (%):", (1 - dt4/dt1)*100)
print()
print("Pure number types")
t5 = os.clock()
d = 0
for i = 1, n do
local v3x = v2x - v1x
local v3y = v2y - v1y
d = d + math.sqrt(v3x*v3x + v3y*v3y)
end
dt5 = os.clock() - t5
print("Result:"..d)
print(dt5)
print("Saving (%):", (1 - dt5/dt1)*100)
end
function draw()
background(0)
end
On my iPad2, this gives the following output:
Vectors - minus and len
Result:424760
0.560913
Vectors - partial
Result:424760
0.409302
Saving (%): 27.0294
Vectors - dist
Result:424760
0.261353
Saving (%): 53.4059
Tables
Result:424760
0.111877
Saving (%): 80.0544
Pure number types
Result:424760
0.0709839
Saving (%): 87.3449
It seems that vec2 comes at a price, the cost being speed.
Thanks for doing this, I’ve wondered about the performance implications of using vec2. Since Lua allows multiple return values, I wonder if it would be better to have a set of functions that take and return vectors by their individual components instead. It would probably be a lot less convenient though.
That’s really interesting, I want to make some test too because I was pretty sure (and I wrongly never checked) that vec2 math should be faster than lua math on generic numbers/tables ecc. also due to some discussion on this forum. Probably something that could make this calc faster using vec2 would be the possibility to not create each time a new vec2 obj (that I fear is the real cause performance problem), like having methods that allows to apply the transformations (like rotate, translate, ecc) directly on the same vec2 or on a vec2 passed as parameter. @Simeon what do you think about @mpilgrem results?
Those are very interesting results. I suspect there may be a lot of overhead when constructing a new userdata type, as well as calling out to C. So for the types of simple calculations you’re performing, the overhead outweighs the benefits.
You can vastly improve the performance of the vec2 benchmarks by locally caching the functions. The biggest slowdown is lookups on the vec2 members.
print("Vectors - minus and len")
t1 = os.clock()
d = 0
local len = v2.len
local v3 = nil
for i = 1, n do
v3 = v2 - v1
d = d + len(v3)
end
dt1 = os.clock() - t1
print("Result:"..d)
print(dt1)
print()
print("Vectors - dist")
t3 = os.clock()
d = 0
local dist = v2.dist
for i = 1, n do
d = d + dist(v2, v1)
end
dt3 = os.clock() - t3
print("Result:"..d)
print(dt3)
print("Saving (%):", (1 - dt3/dt1)*100)
print()
This gives me a saving of ~76% on that particular test.
Caching the function is not ideal, though. But for tight loops, this might be necessary for good performance.
This appears to be due to Lua’s luaL_checkudata call, which validates the vec2 type for safety before attempting to perform the operation.
It’s quite a slow call, I’m going to try to find a way to work around this while maintaining a safety check.
Edit: I am able to speed up the built-in vectors so that they are faster than the “Pure number” and table solutions, however Codea could potentially be crashed by passing in an incorrect type (for example, passing a vec2 into a vec4 length function). Unsure whether it would be worth sacrificing stability for speed.
@Simeon maybe add a global setting function vec2check(boolean), set by default to true? When writing and debugging it would be true, and set to false when game is ready?