Mesh performance issues?

@Simeon @John hey guys i’ve been struggling to figure out some FPS issues i’m seeing, i’ve been able to nail it down to mesh, for some reason once there’s a bunch of meshes the fps will yo-yo from 120 to 80

you can try to reproduce the issue with example here,


function setup()
  parameter.number("xVal", 0, 100, 5)
  parameter.number("yVal", 0, 100, 0)  
  m = mesh()
  m:addRect(WIDTH/2,HEIGHT/2,600,600) 
  m.texture = asset.builtin.Cargo_Bot.Codea_Icon
  m.shader = shader (vert, frag)
  numInstances = 100
  xValTable = {}
  --cam = Camera({})
  --gesture = GestureManager(cam)
end

function draw()
  background(118, 33, 125)
  --cam:drawOrtho()
  for i =1, numInstances do
    xValTable[i]=xVal*i
  end
  m.shader.xVal = xValTable
  m.shader.yVal = yVal
  m:draw(numInstances)
  --cam:drawOrthoNoChange()
  --FPSOverlay.draw()
end


function touched (t)
  --gesture:touchCallback(t)
end


vert=[[
#extension GL_EXT_draw_instanced: enable     
uniform mat4 modelViewProjection;
attribute vec4 position;
attribute vec2 texCoord;
varying vec2 vTexCoord;
uniform float xVal[1019]; // set 1020 to trigger render error
uniform float yVal;
varying lowp float index;

void main() {
  index = float (gl_InstanceIDEXT);
  vTexCoord = texCoord;
  float xOffset = ceil(float(xVal[gl_InstanceIDEXT]));
  float yOffset = yVal;
  vec4 offset = vec4(xOffset, yOffset, 0, 0);
  gl_Position = modelViewProjection * (position+offset);
}
]]

frag=[[
#extension GL_EXT_draw_instanced: enable     
precision highp float;
uniform sampler2D texture;   //diffuse map
varying vec2 vTexCoord;
varying float index;

void main() {
   gl_FragColor = texture2D(texture, vTexCoord);
}
]] 



one thing i noticed as well it’s that using ortho can have an effect on the FPS, you might notice some commented out lines above, please use the attached dependencies to enable these, OrthoCamera will adjust the zoom, Gesture will handle two finger pinch zooming, FPSOverlay will show the fps timeline

what’s EVEN STRANGER is that if i go to screen record the FPS, it will stay at 120 lol!! , so i if the screen recorded is running, Codea will not fall under 120 FPS , but stop recording, let the program simply run and all the sudden the FPS is yo-yo again see attached images

while screen recording- https://codea.io/talk/uploads/editor/8n/6t46rse26s6o.png
normal project playing- https://codea.io/talk/uploads/editor/rs/ysnge4xs1wfk.png

i wish i could profile these issues myself and not have to get you guys involved if it’s not necessary, i’m also hoping that Codea4 will just magically make mesh performance better :smile:

I’d have to profile the code to know for sure but it could be due to the large uniform array size (1000+ elements)

Thanks for responding! i tried dropping the array to 100 to match the number of instances and it’s the same thing. Yo-yo between 120-80.

i know i’m piling on but i have an isometric project i’m working on you might have seen in a different thread, with that one for some reason when i zoom in the FPS tanks hard but zooming out improves FPS -
https://codea.io/talk/uploads/editor/no/q6jm2wezr7gg.png
https://codea.io/talk/uploads/editor/cp/2ucth9pb0okq.png
https://codea.io/talk/uploads/editor/io/z25lq9x3nfxw.png

if it might be helpful to you for profiling i will provide it,

BTW hope your health is doing well

Hey, I really hope I can get your help here, this issue is preventing me from pushing forward on projects.

I’ve included my framework that I setup for working with Codea, it is the “Core” zip. The Isometric demo is included as a separate project. So just make sure to set Core as a dependency in the IsometricTest. If you can get me a profile dump or anything I can dig through it to see where the major problems are.

The shader can be replaced with a simpler one to test that by going to the Shader tab in Core and changing line 11, for example to the ripple shader -
msh.mesh.shader = shader(self.allCode.vert, self.allCode.frag) --shader(asset.builtin.Effects.Ripple) --shader(self.allCode.vert, self.allCode.frag)

Please halp :cry:

@skar - trying to get your demo working but am finding level after level of errors. Replaced a number of textures but latest one is a bitmap font error. Where can I find a suitable font to get past this error?

Core - https://drive.google.com/file/d/1Tbi0jVbQmE4zGXYAXJM9PFbXTM0Ab5kk/view?usp=sharing
IsometricTest - https://drive.google.com/file/d/1znEhbQsQapIBAKkt9lv27ixNCuhUNRwl/view?usp=sharing

My fault I thought it would work without all the extra stuff but I’ve updated the zips now to include any missing assets. Unfortunately they are too big to upload directly here so I’ve shared them from my Google drive above.

@skar Feel free to share demo projects on WebRepo too! They’re always appreciated and it has like a 100MB limit I think. Even includes any project dependencies (not assets in docs though) automatically when you submit.

@skar Zooming in may cause performance issues due to overdraw (i.e. many overlapping pixels being drawn on top of each other) combined with blending operations (expensive). The GPU can’t optimise this very well due to the lack of opaque geometry that could use early out z-buffer rejection (rendering objects front to back).

It was running at 10fps for my on my Macbook until I moved the viewer.preferredFPS call out of draw() then it went up to around 85fps

I had a look at the core project and noticed an extremely large uber shader that has dosens of uniform switches which are used to branch the shader in many different places (also has an array uniform with 100 lights?).

I’m not sure if you are using that shader in the isometric project but I will note that having so many uniforms, texture binds, loops and branches might cause some performance issues. Modern game engines will typically make use of #define and #ifdef statements to avoid a lot of branching. Codea also doesn’t have the most efficient way of finding and setting uniforms and so having 30-40 of them (as well as arrays of structs, etc) can have significant overhead when you have many objects being drawn at once. You could potentially automatically generate a bunch of different shaders with different defines and compile them all when loading the project. With that many options you would end up with potentially billions of shaders so you need to pick and choose carefully what combinations you end up using

It’s difficult for me to offer much more profiling advice at this stage since the source code looks pretty complicated and Lua performance is kind of opaque (you can’t easily measure which code is a hotspot). Your best bet it to try using a stand-in shader that basically does nothing but draw an opaque quad and see if that makes it run faster then compare that to your uber shader. If it makes zero difference then you may be GPU limited, and also might run in to garbage collections issues in Lua. Hard to say at this stage though.

I tried running the GPU profiler but it couldn’t detect Codea and the Xcode FPS counter also wasn’t working. Which is probably an issue with the lack of support for OpenGL ES.

Thanks for the response! Lots of good info for me here.

I don’t know how ortho works in the background, my assumption was anything outside the defined pixels is not drawn and the remainder is stretched. But it sounds like you’re saying it instead makes layers? That would explain why the further I zoom in the worse the performance becomes, more layers. I suppose maybe an alternative solution would be to use setContext and do the cropping and stretching myself every frame?

Not sure why the viewer.preferredFPS would take it to 10 on the MacBook unless there’s a default of 10 on the MacBook, either way good to know, i thought it was a safe call to make every frame.

Regarding the shader I’ve done a bunch of testing including removing most branches and reducing lights to only 1. From my testing on my iPad Pro M1, there’s no noticeable difference, perhaps the hardware is too fast for me to see the impact. Or I wasn’t doing enough to peak it. My primary concern has been regarding the zooming.

I really appreciate you taking time to look into this. It seems that my primary issue is using ortho for zooming, I’ll have to try other things and see where that takes me, I now have a path forward!

@skar The overdraw issue is not related to ortho drawing. Anything outside the screen is clipped and not drawn at all (i.e. vertex shader runs but the fragment shader does not). What I mean is that each fragment that is on screen is being drawn and blended multiple times (even when the pixel alpha is opaque in some areas).

Think of it like this. Say you have a resolution of 2048x1536 and you draw 100 opaque meshes (no blending) from front to back. Z-rejection will cull all the fragments you can’t see since they end up being behind the closest mesh. This means you end up drawing 3,145,728 fragments (i.e. pixels) with your shader.

If you draw 100 sprites with blending turned on in the same way (i.e. full screen due to being zoomed in) you draw 314,572,800 fragments and also incur blending operation costs (i.e. more floating point calculations) each time you draw. If you draw 100 sprites but they only take up 20-30% of the screen you would still get overdraw but the effect is lessened due to the smaller size. There is no trick that will make this kind of thing faster other than optimising your shader and that can only take you so far, since it’s the GPU being choked with a huge number of fragments to draw

To get an idea of how much overdraw you have, try changing your sprites to be a dark red quad and set the blend mode to additive. The brighter an area is, the more overdraw you have. Some engines will also create a shader cost heatmap, which can be useful as well due to the way GPUs distribute shader workload

Check out the answer to this stackoverflow question to see an example of this in action: https://gamedev.stackexchange.com/questions/192857/performance-issue-when-particle-system-occupies-most-part-of-the-screen

I should also note that yes culling is still a good idea, so try not to draw anything that would not appear on screen at all since you can save some CPU cycles not submitting it to the GPU in the first place :slight_smile:

Hmm yes I already do culling you can see in GameObject update function, any mesh offscreen including while zooming is not drawn.

I’m trying to wrap my head around this. Why is blending added when zooming?

Wait I think I get it, the mesh takes up the whole screen so every pixel is looped for every mesh that exists in that space.

@skar bingo!

Try testing a simple replacement shader that just outputs a white pixel or something, that way you can see if its actually the shader slowing things down

Thanks again, looks like there’s some considerations I need to make but over all zoom is still doable as long as I just do it a tricky way. Instead of zooming in, I will set the default resolution as the maximum zoom, then zoom out as the default. That way there can still be a zoom in that doesn’t hurt performance and the assets will look cleaner in this zooming in since they were designed for that scale level. I just need to make my assets a bit bigger from the start with the understanding that I’m going to shrink them down for the real gameplay.

I have a different type of game I’m working on now also that doesn’t get affected by this because there’s only one mesh.

I can’t really explain how much I value Codea and the community. I wish there was a donation I could submit. The purchase price alone is a steal.

I have one game almost made, a second that has lots of work done but needs a more features, and a third that is just in conceptual phases. Being able to code all of these in my iPad is really great. I can’t wait to share more once these are ready.

I’m glad you managed to find a workaround. Codea does have some optimisation issues (and performance limitations given that we use Lua) so there may be some places here and there where we can squeeze out more speed.

Your code/engine looks pretty advanced. Can’t wait to see what you end up with.

We are thinking about a few options for existing users to support us, such as a patreon, as long as it doesn’t end up taking more time than it’d be worth

Well after a bunch of trial and error i got down to the lights part of the shader. Even with just 1 light and an array of 2, the slow down is significant when having more than 20-30 meshes. The zoom isn’t even really a factor because for a full game there will be meshes everywhere for the environment and characters and props, filling up the whole screen, overlapping in many places.

It seems that I may have to abandon using a shader if there’s such a drastic impact. Without it I can have hundreds of meshes. It’s unfortunate because I wanted to use the lights for day/night cycle, and dynamic lighting effects on abilities and spells.

@skar Don’t know if this means anything or not, but I ran your first program at the top of this discussion with the FPSOverlay. This was on my iPad Air 3 which runs at 60 fps. With the program running and just doing restarts, in Landscape mode, the Low value was always 59 fps after 10 out of 10 restarts. When I switched to Portrait mode, 10 out of 10 times, the Low value always varied below 59, at one point going as low as 34. Not sure if mesh is a problem or FPSOverlay.

@skar This implies that the lighting part of the shader is somehow responsible for the performance issues. I’ll have to look at that part specifically but it seems odd that even 1 or 2 lights would have such a pronounced effect

@dave1707 thanks for checking it out, there does appear to be something off between Codea and FPSOverlay, check out the other thread -

https://codea.io/talk/discussion/13332/fps-overlay-shows-80-fps#latest

@John appreciate you taking another look, I’ve made a few tweaks to the shader code like commenting out unneeded branching and uniforms, inverting division in the GetLight function but again minimal to no visible improvement.

Here’s my current scene code and shader code as zipped Lua files.

I’m stress testing with 1 light, 101 meshes and the fps is in the low 20s.