Fastest 3D angle/axis rotation

I was writing some code that builds some meshes from rotated 3D vectors and was curious how to best express that in Lua/Codea (most of my programming is with compiled languages, where the costs must be evaluated differently). Anyway, after benchmarking a handful alternatives, I found (somewhat unsurprisingly) that the fastest options are to rely as much as possible on Codea’s composite library functions. In my case, the speed of the rotations proper is more important than the determination of the rotation matrix/vectors, but it did benchmark both parts of the problem.

Here is the code I settled upon:

function applyMatrixRowsToVec3(v, r1, r2, r3)
-- Let M be a matrix with rows r1, r2, and r3. Return M*v.
    return vec3(v:dot(r1), v:dot(r2), v:dot(r3))

function rotationVectors(u, angle)
-- Returns three 3D vectors corresponding to the rows of a matrix
-- implementing a rotation of the given angle around the given axis.
    local m = matrix()
    m = m:rotate(angle, u.x, u.y, u.z)
    return  vec3(m[1], m[2], m[3]),
            vec3(m[5], m[6], m[7]),
            vec3(m[9], m[10], m[11])
    -- The following is pretty efficient, but using whole-matrix operations as is
    -- done above is faster still. Also, this expanded form requires a unit vector u.
    local c = math.cos(angle)
    local s = math.sin(angle)
    local d = 1-c
    local su = s*u
    local du = d*u
    local r1 = du.x*u + vec3(c, su.z, -su.y)
    local r2 = du.y*u + vec3(-su.z, c, su.x)
    local r3 = du.z*u + vec3(su.y, -su.x, c)
    return r1, r2, r3

function rotateVec3(v, u, angle)
--  Return vector v rotated angle degrees around vector u.
    return applyMatrixRowsToVec3(v, rotationVectors(u, angle))

I hope this is helpful. I suspect if Codea implemented a native matrix-vector multiply, we’d get another 2x or so speedup for the actual rotation. Also, you gain 10% or so by expanding these expressions inline.

@daveedvdv - that’s cool. I’ve just spent months working with Andrew Stacey on using quaternions to do complex rotations for flying, and while [I understand] that quaternions have the advantage of smoother interpolation, your code is extremely compact.

What will help people (including myself) to understand how to use this approach properly is a practical example of how you would derive the axes and angles.

@daveedvdv - I’m interested in the situation where you are accumulating rotations, eg flying something and rotating in different directions, so you are constantly adding new rotations.

It seems the only way to combine rotations with axis-angle is to convert both of them to quaternions, multiply them, and convert back to axis-angle. That doesn’t sound very efficient.

Do you have something better?

Also, when evaluating performance always keep in mind if it’s important. For specialised cases custom coding for max efficiency really matters, but in a lot, if it’s not high frequency usage, simpler to read, easy to maintain code can be much more important than speed.

Quaternions are the best internal representation for rotations if you’ll have to compose many of them. The alternative of multiplying rotation matrices has two notable disadvantages: (a) it’s more costly (in pure operation count and size of representation, at least), and (b) it’s unstable for long composition sequences. Point (b) is particularly severe in your case where you’d presumably be accumulating thousands (or even millions) of rotations as the program progresses: With a straight matrix-multiply you’d quickly end up with matrices that are not unitary, let alone rotations.

That said, say you have accumulated a bunch of rotations into a quaternion q and now you want to apply those rotations to N vectors. Unless N is very small, the most efficient way to do that would be to compute a rotation matrix from your quaternion and then multiply the vectors by that resulting matrix (always multiplying with the original vectors; i.e., v2 = Mv1, and not v = Mv). My tests indicate that that matrix multiply is best (i.e., most efficiently) done with code like applyMatrixRowsToVec3 in my code above; it’s more than twice as fast as writing out the element-wise multiplications and additions.

I’m not sure about your question regarding deriving the axis and angle. The reason that I have this code is because that’s what I happen to have: I want to generate cylindrical meshes give a radius and a direction. If I had some other representation of a rotation (quaternions, Euler angles, Cardan angles?) I’d use a replacement for rotationVectors, but applyMatrixRowsToVec3 would remain the same of course.

Is that helpful? Let me know if I can clarify any of that…

@spacemonkey: That’s very true.

(As it happens, I’d argue that the code I ended up above is about the simplest to read/maintain. :wink: )

My day job is writing/maintaining a C++ compiler, BTW. So I’m not necessarily versed in the most efficient graphics techniques and this is just part of an exploration for a program that is generating very large meshes. Also, I haven’t done nontrivial work with an interpreter in a while, so it’s interesting to see how optimizing calculations is different in that environment.

Hi @daveedvdv,

This looks very nice, and I agree with your reply to Ignatz: quaternions are the best internally, and have stability properties that rotation matrices can’t match. However, matrices are built for applying to vectors so applying a matrix to a vector is the fastest way to do that. That said, I’d be interested in the cost for applying a quaternion to a vector since then you have to add in the overhead of converting the quaternion to a matrix. This might not be faster than simply applying the quaternion directly (which involves two quaternion multiplications).

Did you compare row-based and column-based matrices? I tend to think of matrices as being lists of column vectors, so applying a matrix to a vector would be:

M v = v.x M[1] + v.y M[2] + v.z M[3]

My quaternion code that Ignatz refers to certainly is not optimised for speed. I went for utility mainly. So, for example, I extended the various existing functions so that they would take quaternions directly and not need converting to matrices first (of course, they do the conversion internally). Anyway, there are routines for converting quaternions to matrices.

Incidentally, you may already know this but having made these errors several times myself I thought I’d give you a heads up just in case you hadn’t realised them: in OpenGL, matrices act on the right, and also are 4x4 matrices so that they encode affine transformations, not just linear ones.

The current version of my Quaternion library can be found on github. There are two tabs there: Quaternion.lua and VecExt.lua. The second is the real one, and includes various extensions to vec2 and vec3. But Ignatz wanted it not to be so long so it is partially “minimised”, making it almost impossible for anyone else to read! The Quaternion.lua tab, on the other hand, is the non-minimised version but has been commented out.

Hi @Andrew_Stacey:


I’d be interested in the cost for applying a quaternion to a vector since then you have to add in the overhead of converting the quaternion to a matrix. This might not be faster than simply applying the quaternion directly (which involves two quaternion multiplications).

That’s why I qualified my statement with “Unless N is very small…”; if you only need to rotate a few (say 2 or 3, maybe 4 or 5 thought I doubt it) vectors, then using Hamiltonian multiplications is not enough of a penalty compared to dot products to compensate for the cost of constructing the rotation matrix.


Did you compare row-based and column-based matrices?

I hadn’t, but I have done so now because it’s a good question. The expression:

v = vec3(v:dot(r1), v:dot(r2), v:dot(r3))

appears to be over 2x faster than

v = v.x*c1 + v.y*c2 + v.z*c3

in Codea running on a Retina iPad mini.

I’m a bit surprised by that, to be honest. I had expected a difference, but a smaller one (a 10-20% advantage for the first form, for example). It might be worth double-checking me on that. My guess is that indexing a vector is more expensive than I thought (i.e., “v.x” probably requires a bytecode sequence).

in OpenGL, matrices act on the right, and also are 4x4 matrices so that they encode affine transformations, not just linear ones.

Indeed. That’s implicit in this expression from my code above:

return  vec3(m[1], m[2], m[3]),
        vec3(m[5], m[6], m[7]),
        vec3(m[9], m[10], m[11])

Codea’s matrix — which I assume just maps the OpenGL structure — officially produces column-major, but my code treats it as row-major because I apply it on the left (i.e., I really apply the transpose). (I do that so I can use to improve speed. And since I work with vec3s and rotations only, I just drop the m[4], m[8], m[12]…m[16] elements.)

Thanks for the pointers to your quaternion library! If you wanted to speed things up, I think I see some opportunities for using vector operations in things like the Hamilton product (which is probably a key function?).

@Ignatz, I’m curious: What’s the underlying motivation for preferring the VecExt.lua over Quaternion.lua?

Incidentally, what’s the recommended way to ask for new features?

For example, it appears there is no intrinsic for performing element-wise vector multiplies. E.g.:

    x = vec4(1, 2, 3, 4)
    x = x*x    -- Not currently valid. It would be nice if this made
               -- x equal to (1, 4, 9, 16)

Similarly, intrinsics for various matrix operations would be nice.

@daveedvdv - wrt Andrew’s two versions of his quaternion library, the original was over 1000 lines long and contained many functions that weren’t needed for what we were doing.

wrt new features, just ask as you have done. There are many beta testers on the forum who will pick up your request if they like it.

@Ignatz: Thanks! (Also for giving me hope that my wishes are heard.)

I can see you’d want just the functions you need, but having the nice formatting, identifiers, and comments seems desirable in any case.

@daveedvdv - for general purposes, you might want all the extra stuff, but not just for flying rotations!

We look forward to seeing some of your results (if you care to share them, naturally).

@Ignatz: Which results beyond the one I reported would you like? I’m happy to report anything that might be useful. FWIW, all I do is run code like:

local start = os.clock()
for n = 1,1000000
  -- Do something short here.

I’m guessing you guys are already familiar with this, but I just found that Wikipedia has a nice page on the connection between Quaternions and rotations. They even have an operation count analysis:

As noted earlier, that’s not directly useful in the context of interpreted computation, but still interesting.

@daveedvdv - no, what you’ve reported is fine, thank you.

By the way, our users are a very broad mixture of amateur and professional, young and old, so if you post something interesting, a simple example to help those less familiar with the technicalities (including myself) will be appreciated. That’s only if you want to do so (and have the time to do it), naturally.

@Ignatz: Sounds good. I’ll try to be more liberal with examples.

@Ignatz @daveedvdv I’m a beta tester and I pick up on any new requests by the users, but as a beta tester I have no control over what new features get added. If I did, my suggestions would be at the top of the list.

Here is some example code using the fast rotation above. It contains a fairly general cylinder mesh generator (which I’m hoping is efficient too, though I don’t have something to benchmark it against) and then some code to demo that.

-- MeshTest

function sweepHollowCylinderMesh(basePoint, axis, radius, aN, cN)
-- This function creates a mesh of triangles that form a cylinder (without
-- tops). The mesh' vertices and normals are set, but not its colors.
-- This returns the mesh and its number of vertices.
-- basePoint (vec3): One end of the cylinder's axis
-- axis (vec3): The length and direction of the axis
-- radius (vec3 or number): Either the vector swept around the axis to
--                          describe the cylinder, or just a numerical radius
-- aN, cN: The number of steps in the axial and circular sweeps, respectively.

    -- Create the mesh and pre-size its position and normal buffers.
    local m = mesh()
    local bufLen = 6*cN*aN
    local posBuf = m:buffer("position")
    local normBuf = m:buffer("normal")
    -- If radius is given as a length, turn it into a vector orthogonal
    -- to the axis.
    if type(radius) == "number" then
        local r = radius
        local ax, ay, az = axis.x, axis.y, axis.z
        -- Beware of catastrophic cancelation
        if az >= math.abs(ax) and az >= math.abs(ay) then
            radius = vec3(az, az, -ax-ay)
        elseif ay >= math.abs(ax) and ay >= math.abs(az) then
            radius = vec3(ay, -ax-az, ay)
        elseif ay >= math.abs(ax) and ay >= math.abs(az) then
            radius = vec3(-ay-az, ax, ax)
        radius = r*radius:normalize()
    -- Compute the incremental rotation vector and the incremental
    -- axial sliding vector for sweeping the mesh.
    local rot1, rot2, rot3 = rotationVectors(axis, 360/cN)
    axis = axis/aN
    -- Sweep the mesh with axial sliding in the inner loop since that
    -- is more efficient than the rotational sweep.
    local i = 1 -- Index into the buffers
    local r1, r2, b1, b2
    r1 = radius
    for c = 1, cN do
        -- Rotate the radius vector, but ensure that the last one is
        -- exactly the one we started with.
        if c == cN then
            r2 = radius
            r2 = applyMatrixRowsToVec3(r1, rot1, rot2, rot3)
        b1 = basePoint
        for a = 1, aN do
            b2 = b1 + axis
            -- Create two triangles:
            local p2 = b1+r2
            local p3 = b2+r1
            posBuf[i] = b1+r1
            posBuf[i+1] = p2
            posBuf[i+2] = p3
            posBuf[i+3] = p2
            posBuf[i+4] = p3
            posBuf[i+5] = b2+r2
            normBuf[i] = r1
            normBuf[i+1] = r2
            normBuf[i+2] = r1
            normBuf[i+3] = r2
            normBuf[i+4] = r1
            normBuf[i+5] = r2
            i = i + 6
            b1 = b2
        r1 = r2
    return m, bufLen

function setup()
    basePt = vec3(0, -50, 0)
    axis = vec3(10, 100, 10)
    radius = 50
    hN = 2; cN = 24
    m, nVertices = sweepHollowCylinderMesh(basePt, axis, radius, hN, cN)
    colorBuf = m:buffer("color")
    for n = 1, nVertices, 6 do
        local c = color(240+math.random(-15, 15))
        for i = n, n+5 do
            colorBuf[i] = c
    camPos = { angle = 0 }
    tween(12, camPos, { angle = 2*math.pi }, { loop = tween.loop.forever})

function draw()
    -- Clear screen to black.
    background(0, 0, 0, 255)
    perspective(45, WIDTH/HEIGHT, 200, -200)
    camera(0, 400*math.cos(camPos.angle), 400*math.sin(camPos.angle),
           0, 0, 0,
           0, math.cos(camPos.angle+math.pi/2), math.sin(camPos.angle+math.pi/2))

@dave1707: Thanks. I cannot wait so see what 2.0 will bring.

@daveedvdv - thanks for the demo, it makes it much easier to see how it’s used

@Ignatz - Wrong emoticon, or am I missing something?