# Optimising vector performance

I recently discovered that a simple three line function with one normalize and one dot function, could be speeded up over 4x by getting rid of the vectors.

What I mean by that, is breaking vectors up into separate x,y,z (scalar) values, and doing calculations on each separately - and avoiding expensive operations like square roots.

So this

``````function IsVisible(pos,radius) --pos is vec2
local v=(p-cameraPos):normalize()
end
``````

can be made over 4x faster with this

``````function IsVisible(pos,radius)
local px,py=pos.x,pos.y
local dx,dy=px-camposX,py-camposY
if dx*camdirX+dy*camdirY<0 then return end
local ptx,pty=px+camdirX*u-camposX,py+camdirY*u-camposY
local sq=ptx*ptx+pty*pty
local a=ptx*camdirX+pty*camdirY
return a*a>cosFOV2*sq
end
``````

Simeon explains it like this:

The performance difference appears to be due to allocations, every vector mult / sub / add has to allocate a new vector object as Lua user data to return its results. The overhead in the allocations accounts for all the difference in performance.
``````I'm going to look into whether we can come up with an alternate memory allocator for lots of small short-lived objects.

Note that this problem exhibits itself because the vectors are short-lived and created / deleted constantly. Using vectors in a more long-term scenario should be totally fine (the overhead will not really be noticeable without lots of operations).
``````

You haven’t linked to the blog post!

https://coolcodea.wordpress.com/2015/11/17/optimal-culling/

Thanks for the shout out.

4 x increase is pretty amazing, I can see I’m going to have to rewrite chunks of code where I’m having performance issues. It’s a shame in a way though, as the 3-line version of the function is so much more readable than the fast version.

Just a note that you should be able to do:

``````local px, py = pos:unpack()
``````

To get the elements out of a vector for the purposes of writing a decomposed version of a function.

I did some more testing, and while functions like normalize seem to be fairly optimal, vector arithmetic can be slower than scalar equivalents.

For example, v1a+v2 can be much slower than vec2(v1.xa+v2.y,v1.y*a+v2.y)

(Also, for some reason, a^0.5 is way faster than sqrt(a), even if you’ve localized sqrt).