What is the most efficient way to draw lots of meshes?

LoopSpace · December 17, 2014, 5:20am

I got the same averages as @dave1707 on my iPad 4 using the latest beta.

However, I decided to look a little closer at the numbers being produced and discovered that the variance in the framerate is quite large. The framerate ranges from about half the average to about twice. This results in extremely choppy animation.

If your mesh rectangles are following a deterministic path (as they are in @West’s code), it is far, far more efficient to use a shader to update their trajectory. Then you need to pass in a load of initial data but at each draw cycle you only pass in the elapsed time. The shader then computes the updated position. Moreover, as this is happening on the GPU, it can be done in parallel rather than in a single thread on the CPU. Not only is this far, far faster it also results in much smoother animation.

For example, using my explosion shader (which does the same: moves and rotates rectangles), I get a framerate of 30 with 27,000 rectangles. At 55,000 rectangles, my framerate is 20. At 110,000 the framerate is 11. As with the code here, once it starts going down then it is inversely proportional to the number of rectangles but the number of rectangles needed before it starts going down is far higher.

(Incidentally, @Ignatz’s terminology is incorrect. A linear relationship is described by an equation of the form y = m x + c and this is not. Rather, it is inversely proportional in that the number of rectangles times the framerate is roughly constant. You could say that the relationship between the number of rectangles and the time taken for each frame to render is linear but “time taken” is the reciprocal of the framerate which was the quantity being discussed.)

So if you can, shift the mesh’s movement into a shader. An explanation of my explosion shader can be found at http://loopspace.mathforge.org/HowDidIDoThat/Codea/Shaders/.

brookesi · December 17, 2014, 6:52am

Hi @LoopSpace,

Regarding:

…it is far, far more efficient to use a shader to update their trajectory. Then you need to pass in a load of initial data but at each draw cycle you only pass in the elapsed time. The shader then computes the updated position.

How do you pass arbitrary data in? Is it just a set of numeric variables, or do you ‘spoof’ up a texture and read results out of eg rgba values?

Just curious…or does the shader now support table/array data?

@Brookesi

LoopSpace · December 17, 2014, 6:56am

@Brookesi Take a look at the link I posted. That contains the details of how to pass this information through.

dave1707 · December 17, 2014, 10:30am

@TechDojo @LoopSpace I’m still running the slow version of Codea. I expect the fixed version will result in a speed increase of about 3 times. I tried doing a more instant timing and found that the time varied a lot from frame to frame. That’s why I used an average time over the whole run, it’s easier to get a fixed value. I use the total number of draw cycles divided by the total of DeltaTime.

TechDojo · December 17, 2014, 2:06pm

Hmm - as a lot of those triangles would be passing over each other as they move I wonder if the GPU is doing some clever stuff ignoring overdrawn pixels and therefore the actual drawtime could fluctuate, or alternatively it might be down to the rotation (matrix calc) taking effect, it might be interesting to stop the rotation and just use a fixed angle to see if that makes a difference.

@dave1707 - I actually really noticed the slowdown for the first time yesterday, I was playing with an old fractal landscape demo, where most of the code is done in setup to actually generate the mesh and then each frame it simply repositions the camera. What I noticed is that the initial startup time took a lot longer (so much I’d initially thought the demo had crashed on the new build) but when actually running the difference was minimal (if anything). So I guess any timings on processor intensive operations should be ignored until the new version is released.

@LoopSpace - thanks for sharing the shader code, I’m still trying to get my head around your perspective correct shader

LoopSpace · December 17, 2014, 4:29pm

@dave1707 I ran West’s code on my iPad and got the same figures as you did. So the speed-up from passing to shaders is entirely down to passing to shaders and not to being on different betas. Also, while average fps gives a reasonable overview, looking at variation can be important too. I looked at the time taken per frame and saw that it jumped a lot, so looked at the minimum and maximum time over the last ten frames as well. That’s where I saw that it varies from half to double the average.

@TechDojo Which is the “perspective correct shader” you’re talking about?

Ignatz · December 17, 2014, 8:04pm

@TechDojo - In my 3D work I’ve noticed (the obvious) that the more pixels need colouring, the slower it is, so I wondered if that had an effect here because the screen is so crowded.

I tried simply restricting the images to 1/4 of the screen, so 3/4 was blank, and it had no effect on speed whatsoever, which surprised me, because another thing I learned from 3D is that OpenGL is extremely efficient at culling unseen vertices, and when you restrict screen space, that should mean fewer visible vertices.

TechDojo · December 18, 2014, 4:38am

@LoopSpace - it was this one http://loopspace.mathforge.org/HowDidIDoThat/Codea/Gradient/

@Ignatz - From my readings of OpenGL I think it can detect (possibly through the use of a z buffer) if the pixels have already been drawn and then not draw them again - I remember something about rendering semi-transparent triangles and making sure that they are drawn in the correct Z order. This would obviously be beneficial if the triangles were pre-sorted in Z.

I worked for a company many years ago that created an arcade board that was very good at rasterising spans of pixels for objects across each scanline ensuring that there was no overdraw. It was very fast and particularly good at scaling sprites (ala Afterburner and Outrun) but semi-transparency then was a real issue (I don’t think it was supported - but then it was 1993 ).