Are you a C wizard? Codea needs your help

Ignatz · November 18, 2015, 1:37am

Simeon is looking at increasing vector speed performance in Codea, but it is complex.

If you know enough about C to help with the challenge below, please message Simeon.

He says …

I spent some time looking at solutions for removing the need for allocations on vector objects.

The problem is that vec2 (and friends) are Lua “user data” objects, which are basically Lua-managed interfaces to C code and data. The problem is that performing operations on them needs to return a result, which needs to ask Lua to create a new user data. This causes Lua to allocate memory with its memory allocator. And that’s the source of the performance difference.

Out of curiosity, I wrote a pure-Lua vec2 implementation and tested it on your benchmark. It was extremely slow (1000 iterations/frame vs. 8000 for user data).

There are only a few solutions I can think of, and some of them seem a bit infeasible.

Modify the Lua interpreter to support native vector types. This would be pretty tricky and make keeping up-to-date with future versions of Lua quite hard.
Find a faster memory allocator. I tried using LuaJIT’s memory allocator with Lua and it seems a little faster, though not much. I’m pretty sure the generalised malloc/realloc/free are as fast as they get.
Use a custom memory allocator that draws from a static pool rather than the heap. This would require some fixed memory size from which vectors can be drawn. The problem is you’d probably want to make this explicit in the code, or you’ll run out of pool memory in general purpose applications.

Something like:

vec2.pool(20)

– Do expensive vector loop, only 20 vectors can be allocated before old ones are reused

vec2.flush()

This is hard to implement in a bug-free manner though.

Use a vector-expression API where you pre-construct the expression tree then evaluate the expression with vectors as arguments, e.g:

local expr = vec2(“someVec + someOther * (radius * 2.6) - pos”)

– In the loop

local v = expr:evaluate(a, b, c, d, e)

This is a bit clunky though. And it still allocates at least one new vector (v)

sim · November 18, 2015, 2:17am

@Ignatz I’d still recommend using vectors and matrix types over unravelling them yourself because it makes your code cleaner.

Only once you actually do encounter performance issues should you look to optimise your code, and I expect in the vast majority of cases performance will not be an issue.

I think your case (geometry culling?) is quite exceptional to the norm and certainly warrants the optimisations you made.

Ignatz · November 18, 2015, 2:32am

I agree, I’d only look at decomposing vectors when I’m running functions thousands of times a frame.

And I’m certainly not complaining. Codea runs very fast as it is!