Optimization Question - Drawing many meshes/sprites

Hey everyone,

I’ve been struggling with drawing an abundance of sprites. To be precise, my game needs to draw a lot of grass. Basically, I use a mesh with a texture that consists of a few blades of grass, which is animated using a wind-shader. To have a nicer effect, I draw a couple of different versions of it with different wind strenghts and use setContext(...) to create images to use on the spot. This way, I don’t have to create hundreds of meshes that all calculate their shader on their own, but draw the mentioned images using sprite(...).

Before the game even starts to draw anything, I use a custom algorithm to distribute the grass across a 2d-plane. At the end, I have a 2d-array with probably thousands of entries, which contains the positions where a grass-sprite should be drawn.

In the actual draw-method, the grass-sprites are drawn. I loop through the 2d-array and draw one of the few variations of the sprite. This amounts to sometimes hundreds of grass-sprites being drawn at the same time, which (obviously) brings down the performance considerably. Only a fraction of all possible grass-positions is drawn at the same time as the 2d-array contains all positions across the entire 2d-plane, of which only a part can be visible since there is a camera that follows the player.

As a result, I thought using a mesh would be beneficial. I created an image grassAtlas = image(...), used setContext(grassAtlas) and then drew all different grass versions next to each other. This way, I thought, I would have one sprite, the grassAtlas, that could be used as the texture and using mesh:setTexRect(...), I could set the version for a specific rect that I would be adding by mesh:addRect(...). As a result, I would have one single mesh, but could use the same variations of the grass-sprite as before.

As long as I don’t draw the mesh, everything’s fine. I was worried that adding thousands of rects to the mesh could be an issue, but it doesn’t seem to be. However, drawing the mesh instead of the sprites brings down the performance considerably more, even to a point where the game’s basically unplayable. This seems odd to me as drawing the same amount of sprites was heavy on performance, but the game was still running at a decent and stable framerate.

Now I have two questions:

  • Should my approach theoretically work? By that I mean: Is my assumption correct that multiple rects of one mesh, which uses a single texture atlas but attributes a different part of said texture using mesh:setTexRect(...) to different rects, only require one draw call instead of one per rect?
  • Is there any efficient way to draw that many rects of a mesh/add that many rects to a mesh? I’m fully aware that both of my approaches are terrible as far as memory consumption and performance are concerned, but I also don’t know of a better one.

Thanks in advance and sorry for the long text!

@Elias Not sure about what your doing, but here’s a stripped down version from a mesh program I used to check the FPS of the different Codea versions as they came out. I’ve used it since version 2.1 . This is setup to run 3000 meshes of size 30. It runs at 59.99 FPS on my iPad Air 3. I run the original program at different sizes and numbers and keep track of the FPS of the different versions. Don’t know if this helps any or not.

viewer.mode=FULLSCREEN

function setup()

    size=30
    nbr=3000

    count=0
    fill(0)
    tab={}
    for z=1,nbr do --# of meshes
        table.insert(tab,m(math.random(50,WIDTH-50),
            math.random(50,HEIGHT-50),math.random(360)))
    end
end

function draw()
    background(132, 224, 217, 255)
    for a,b in pairs(tab) do
        b:draw()
    end
    count=count+1
    text(1/DeltaTime,WIDTH/2,HEIGHT-25)
end

m=class()   -- mesh

function m:init(x,y,r)
    self.x=x
    self.y=y
    self.ms=mesh()
    self.rot=r
end

function m:draw()
    pushMatrix()
    translate(self.x,self.y)
    rotate(self.rot+count)   
    self.ms.vertices={vec2(0,0),vec2(size,size),vec2(size,0)}
    self.ms:setColors(255,0,0)
    self.ms:draw()  
    popMatrix()
end

I think, you can avoid doing the setRectTex() by moving the animation part into the shader. You can use a counter as uniform and use it to offset the UVs.
For that, you’ll need to carefully think where you should put you image in the atlas.
For instance on the X axis: spread the variation, and on the Y axis: spread the animation frames.

Also, but not related: having one big 1D table should be better, performance wise, than a 2D table.

@Elias From what you’re describing you’re calling mesh:addRect(…) every frame for every grass image to be drawn and regenerating the mesh every frame. Is that right?

If so, assuming that for the vast majority of the time the grass positions do not change at all rather than generate a mesh for on-screen grass on each frame split the 2D plane into ‘chunks’ so say for a world 128x128 divide it into 16x16 chunks with a total of 64 chunks. Generate a grass mesh for each chunk (not during rendering) then while rendering draw any chunk that is at least partially on screen. The grass that’s offscreen should be efficiently culled by the GPU with minimal overhead. This would allow far quicker ‘grass in view culling’ and also avoid the need for you to regenerate the mesh on every frame (which is sub-optimal at best due to bandwidth costs).

Combined with @moechofe2’s suggestion I’d expect you to see some better results.

When it comes to mobile GPU performance modern iOS devices are far more capable than people actually realise, it just requires a little optimisation (e.g. https://apps.apple.com/gb/app/alien-isolation/id1573029040).

It’s also worth noting that unless there’s caching going on in Codea’s runtime every call to mesh:setTexRect(…) and mesh:addRect(…) may be introducing an OpenGL call too, which can really add up to poor performance.

@dave1707 Thanks for the program, I’ll check it out to see if it is similar to what I’m looking for!

@moechofe2 Thanks, I’ll consider this later on! However, I don’t think that this is the reason it performance so badly. Didn’t know that about tables, thanks!

@Steppers Almost, I use mesh:addRect(...) when generating the grass-positions and mesh:setTexRect(...) to assign one of the ten grass-sprites that are within the texture-atlas, also when generation the positions. I only occasionally need to change the color of certain rects, but as soon as the project is initialized, no new rects are added to the mesh.

As for your suggestions about using chunks: I’ve considered this as well, though shouldn’t one mesh already equal one draw call? I assumed that using one mesh with one texture wouldn’t end up causing multiple draw calls.

@Elias You’re right in that having one mesh and one texture should result in one draw call, the chunk thing was mainly to have your culling logic (if any) be more efficient.

Also, if you are modifying the mesh with setTexRect every frame then that’s also unlikely to be efficient.

@Steppers Hm, but I’m also only using setTexRect at the start. I’ve also noticed that just adding the rects and drawing the mesh already tanks the performance. Maybe adding all those rects is already too much…

@Elias Here’s another version. I’m using addRect and I’m not creating the mesh every draw cycle like in the previous example. I display the FPS and the number of meshes. I don’t know how many rects you were creating, so you can alter the size variable to change how many rects are created.

A size of 10 creates 9,213 rects and runs at 59.99 FPS on my air 3.

viewer.mode=FULLSCREEN

function setup()
    size=15
    xs=WIDTH//size
    ys=HEIGHT//size
    fill(255)
    tab={}
    for x=0,xs do
        for y=0,ys do
            table.insert(tab,m(x*size,y*size,math.random(360)))
        end
    end
end

function draw()
    background(132, 224, 217, 255)
    for a,b in pairs(tab) do
        b:draw()
    end
    text(1/DeltaTime,WIDTH/2,HEIGHT-25)
    text(xs*ys,WIDTH/2,HEIGHT-50)
end

m=class()   -- mesh

function m:init(x,y,r)
    self.x=x
    self.y=y
    self.ms=mesh()    
    self.ms:addRect(self.x+size/2,self.y+size/2,size,size)  
    self.ms:setColors(math.random(255),math.random(255),math.random(255))
end

function m:draw()
    self.ms:draw()  
end

@dave1707 Thank you, this is extremely helpful! I’ve increased the number of rects to be added and saw a significant drop in performance. I also added a way to change between drawing the mesh and drawing the same amount of sprites. For some reason, the performance when using sprites was better…

@Elias Here’s another version that uses a shader to move the Grass in a random x,y direction each frame. I don’t have a wind shader or know what grass you’re using for the texture, so I just used what I had.

On my iPad Air 3

At a size of 10, there are 9,213 rects and it runs at 59 FPS.
At a size of 5, there are 36,852 rects and it runs at 59 FPS.
At a size of 2, there are 231,852 rects and it runs at 59 FPS.
At a size of 1, there are 927,408 rects and it runs at 14 FPS.

viewer.mode=FULLSCREEN

function setup() 
    img=readImage(asset.builtin.Blocks.Grass4)
    size=10
    xs=WIDTH//size
    ys=HEIGHT//size
    fill(255)
    m=mesh()
    for x=1,xs do
        for y=1,ys do
            m:addRect(x*size,y*size,size,size)
        end
    end
    m.texture=img 
    m.shader=shader(vShader,fShader)
end

function draw() 
    background(40, 40, 50) 
    m.shader.xset=math.random(-1,1)*.2
    m.shader.yset=math.random(-1,1)*.2
    m:draw()
    text("FPS "..1//DeltaTime,WIDTH/2,HEIGHT-30)
    text("# Rects "..xs*ys,WIDTH/2,HEIGHT-60)   
end

vShader = [[
uniform mat4 modelViewProjection;
attribute vec4 position; 
attribute vec4 color; 
attribute vec2 texCoord;
varying lowp vec4 vColor; 
varying highp vec2 vTexCoord;
void main() 
{   vColor=color;
vTexCoord = texCoord;
gl_Position = modelViewProjection * position;
}    ]]

fShader = [[
uniform lowp sampler2D texture;
varying lowp vec4 vColor; 
varying highp vec2 vTexCoord;
uniform lowp float xset;
uniform lowp float yset;
void main() 
{ gl_FragColor=texture2D( texture,vec2(vTexCoord.x+xset,vTexCoord.y+yset))*vColor;
}    ]]

here’s a fun one to play around with - meshes.zip

what i’ve learned is that shaders will impact performance the most, followed by the size of assets, and overall of course everything is dragged down by number of assets

(one interesting note is that i’ve noticed it doesn’t take as much to drop from 120 to below as it does to drop below 60 (120 and 60 being relevant to your device max and half refresh rate))

i swear it used to perform better, but i dont have any actual data,

if memory serves right i used to be able to see 5-6 thousand meshes at full performance (120 for my ipad pro) but now it drops below 120 around 2.2K+

@dave1707 Thank you, for some reason this runs much better than with the changes I made to your previous code. I’ll need to check what the difference is between the two.

@skar Thanks, I’ll check your project out as well! That’s interesting, I’d be curious what the reason for that could be.

@Elias The difference between the 2 are the first one created multiple meshes and the last one just created 1 mesh and added the rects to it.

@dave1707 Exactly, but I changed the first example so that it only uses 1 mesh and adds rects to them. So I must have messed something up there and that might also be the reason why it performed so badly with the grass in my game.

@Elias Is there any way you can post just you mesh grass code. Maybe if we can see what it’s doing it might be more helpful. Is there a difference in speeds if you just show the mesh with or without the wind shader.

keep in mind the size of your texture will impact performance, i had an issue previously where i thought i was using a 300x300 png but i was actually using a 1600x1600 png where only the middle 300x300 had pixels. this was hard to figure out but all those extra pixels even though they were alpha 0 were still in the fragment shader

@Elias Without knowing exactly what you’re doing with the grass, I’m just throwing out random code. Here another example where I’m drawing 6,348 grass rects. Tap the screen and I redraw about 3,200 rects where I wiggle the grass. All of this runs at 59 FPS. Even if I change the size to 5 and draw 36,852 rects and wiggle about half of them, the FPS is still 59. Again, it would help if you can show some code you’re using for the mesh.

viewer.mode=FULLSCREEN

function setup() 
    size=12
    xs=WIDTH//size
    ys=HEIGHT//size
    fill(255)
    tab={}
    m=mesh()
    m.texture=readImage(asset.builtin.Blocks.Grass4)
    for x=1,xs do
        for y=1,ys do
            m:addRect(x*size,y*size,size,size)
            table.insert(tab,vec3(x*size,y*size,0))
        end
    end
    tot,cnt=0,0
end

function draw() 
    background(40, 40, 50) 
    cnt=cnt+1
    if cnt>20 then  -- wiggle every 1/3 second
        cnt=0
        for a,b in pairs(tab) do
            if tab[a].z==1 then -- if 1 then wiggle
                m:setRect(a,tab[a].x,tab[a].y,size,size,math.random(-1,1)*.4)
            end
        end
    end
    m:draw()
    text("FPS "..1//DeltaTime,WIDTH/2,HEIGHT-30)
    text("# Rects "..xs*ys,WIDTH/2,HEIGHT-60)
    text("# Wiggles "..tot,WIDTH/2,HEIGHT-90)
end

function touched(t)
    if t.state==BEGAN then
        for r=1,xs*ys do
            tab[r].z=math.random(2) -- set to 1 or 2
            if tab[r].z==1 then
                tot=tot+1
            end            
        end
    end
end

@skar You’re right, I hadn’t considered this. I basically use ten meshes that have a shader applied to them, use setContext(img) to draw all ten of them next to each other, which is treated as a texture atlas for spritesheet-animations, and then supply img as the texture to the actual mesh with all the rects. This way I only apply a shader to those ten meshes. The actual mesh with all the rects then uses m:setRectTex(...) for individual rects to select one of the ten different versions of the grass. I don’t think that the image generated this way is too large, but I should take a look at that.

@dave1707 Thanks for another example! I haven’t had the time yet to check whether I was doing something wrong compared to your code. But I’ll probably find some time for it today. If I can’t spot a difference, I’ll post my code here (unfortunately, it’s more complicated than just a quick copy-paste as there are some references to other parts in my code).

@Elias If you can’t post any code, maybe you could give a good description of what you’re actually trying to do with the mesh. What mesh commands you’re using, etc. Reading your above posts, it’s hard to get an idea of what’s going on.

So, I’ve spent some time today trying to figure out what the issue was. It’s actually kind of embarrassing… I had previously used a nested for-loop to draw the sprites:

for i = 0, value, 1 do for j = 0, value, 1 do local pos = table_that_contains_positions[i][j] -- some code sprite(...) end end

When replacing this version with drawing a mesh, I couldn’t just delete the entire thing as the loop contains some code that changes the alpha-value of the grass if you step onto it. Long story short, I didn’t put mesh:draw() behind the last end, but within the outer loop, which caused Codea to draw the mesh around 300 times. No wonder the performance was so bad… Now it runs at almost 60 FPS.

Thanks to everybody for their help!