A simple way of culling unused vertices in 3D scenes

I’m working on a 3D FPS with rooms and corridors. In order to allow customisable levels, I’ve divided the playing area into tiles, and each tile can be occupied by one thing defined by the user. For buildings, that thing is a cube.

Of course, there are many unused vertices when you stack cubes together, and this has a definite speed impact. But I found it was quite easy to loop through the finished mesh and look for pairs of identical (rectangular) faces, ie with the same four corner vertices, and then delete both of them - because faces that back onto each other can be deleted. This cut the number of vertices by about 2/3, which makes a big difference to speed.

(Any other ideas welcome, I can always use more speed).

I thought you posted before that someone at apple said to not bother trying to do your own culling, because the graphics card does it automatically.

Slowest = leaving all the vertices in, and writing code that decides for each frame which ones to draw. This is what Apple was advising against, because…

Faster = leaving all the vertices in, and letting OpenGL manage which are drawn

Fastest = at the beginning, deleting all the vertices you don’t need (which is the approach above)

hope that’s clearer!

With your cubes, are you checking for faces that are on the bottom - I know your first pass will pick out the bottom faces for any cubes stacked on top of another, but what about the faces that lie directly on the floor and are also invisible?

good point. I did include the option of omitting bottom faces…

I am using tiles and cubes because you can get a nice scene with a simple ASCII map like this

bbbb_bbbbb
b__b_b____b
b__b_b____b
b___________
bbbbbbbbbb

This is how it looks (the recorder makes it a bit jerky)

https://www.youtube.com/watch?v=OrMJMrMwX3E

PS I realised why some of the early games used trampolines to get you between floors - because it requires no drawing of ladders etc! So that’s what I’m using.

Looks awesome!

Have you thought about grabbing some of the original Wolfenstein maps and textures for that real retro feel :slight_smile:

Very clever idea. For my 2.5D platform game, I’d been thinking about how to implement something similar. My idea was to scan the ascii level map, identify contiguous shapes and unique corners (sounds so simple when you write it down… nightmarishly complex in practice, and never left the drawing board) and then create a shape with the fewest number of vertices possible. This way of doing it is way simpler though!

Well, I’ve tried this every way I can think of over the past year or so.

For my 2.5D dungeon, I also used an ASCII map, but I drew walls (not cubes) along the middle of the tiles, which keeps vertices to a minimum. If you draw each straight wall length as one long rectangle (rather than a series of wall tiles), that also cuts vertices, but it would be hard to automate that.

The best trick in the dungeon was using distance based lighting, which meant I could clip the far plane at the same distance as the light, and that had a big effect on speed.

In the scene above, I’m using simple diffuse lighting outside just to make the doorways visible, and inside I basically just light everything fully.

Kind of off topic, but…

Does OpenGL only load the part of the scene that can be viewed when turning 360 degrees? Or is the whole map loaded?

When I was messing with Unreal Development Kit, there was a way of setting which areas were loaded, depending on where the player was standing. So only the part of the map that was in view was loaded. I forget what this process was called though.

This was an absolute necessity to getting good fps, especially on mobile.

OpenGL only draws what can be seen, but it has to process all the vertices in the scene. It is efficient, but the more vertices you have, the more work it has to do.

It can ignore anything that is “off screen”, and this works fast, but if, say, you are in a dungeon, and in front of you are 10 rooms behind a wall, each of which could be seen if there was nothing in the way, ie they are all “on screen”, then OpenGL has to look at all the pixels for each room, keeping the closest one for each position on the screen.

In my dungeon, I used clip to restrict drawing to the light radius of just 100 pixels, so if there were 10 rooms in front of me, 9 of them would be off camera, and this makes things much faster.

So from my own experience, the fastest techniques are

  1. Minimise the number of vertices

  2. Clip as much as you can

  3. Let OpenGL manage the rest

As an example of minimizing vertices, the ground surface of my scenes might be 2000x2000 pixels, with an image texture that is repeated hundreds of times, but it is just one rectangle, ie 6 vertices, thanks to a shader that tiles the texture.

magical?brilliant

Another approach that I used on my extended terrain flyer thing (which I never finished) was to use a quad tree to drill down to meshes, it means you have to have your meshes chopped up as per the grid, but it works well. I wasn’t but I’m sure you could cull not just what’s “in front” but also cull things too far away in front.

The other thing you can do if you have memory to burn is to have lower resolution meshes on larger quads in the tree so if it’s in front but far away you run with a lower res mesh and when it’s close use the smaller, higher res ones.

It’s a bit involved but if you are interested I can run through the approach.

@spacemonkey - if users can define their own maps using ASCII codes as shown above, can the program figure all the quad stuff out for itself?

If so, it might be interesting to compare approaches

Yeah, I guess if you are building the meshes from an ascii map then it would be relatively straight forward to build the meshes in pieces and the quad tree on top of it. My terrain thing has a whole bunch of other mojo around HeightMaps, but I’ll have a look at it sometime in the next day or two and see if I can pull out enough stuff to make it relevant to this. The biggest advantage (assuming memory doesn’t go bang) is you can then build up absolutely huge maps without necessarily hurting the rendering performance.

And ideally since the world isn’t dynamically changing is that you could do your culling pass during the build up from the ascii map…

A problem may be that you have layers, eg different floors in a building. This makes it simpler in the sense that once inside, you only have to draw the floor you’re on, but when you’re outside, you can see the whole outside of the building, so that approach won’t always work.

I’m not sure it’s worth the effort, because what I’m doing works pretty fast.

Yeah, but I haven’t been playing with Codea much, and my life is getting a little less busy… you’ve piqued my interest so I may do something just cos it’s there :wink:

I understand the ASCII map, how were you thinking of doing multiple layers? An ASCII map per layer?

I’m surprised someone hasn’t tried to implement the bsp system from quake1. It’s GPL open source. It’s preprocessor intensive tho.

Ok, did a little playing around with this. The gist below is a program that builds up a scene from an ascii layout held in the Map tab. It then has classes for 4 different forms of rendering. Comment out all but one on lines 9-12 of main. Finally, adding a distance limit to the perspective call on line 41 affects performance. I use profiler to gather performance across 1000 draw frames.

1 Cube mesh and drawing it many times: 0.334034s per frame
1 Single mesh: 0.011952
1 Single mesh with internal faces culled: 0.004340
50 Meshes 4x4cubes each culled in a quadtree: 0.008465

If you set a view distance of 15, then 1 culled mesh and the quadtree become 0.004161 and 0.004018.

So, a single big mesh wins for the map of 40x40x3. I would expect as map size rises or the complexity (maybe shapes other than cubes) increases the quad approach would start winning.

Interesting stuff.

https://gist.github.com/sp4cemonkey/ac2a490f0124de77c008

@spacemonkey - all of that is what I’d expect.

Drawing the same cube over and over is slowest because you’re drawing all the faces every time and having to translate/rotate each of them separately

A single mesh is faster, but can be speeded up nearly 3 times by removing unwanted faces - which are about 2/3 of the total number.

A quad tree requires quite a lot of calculation in Codea, but culls the number of vertices, so that figure looks reasonable.

Clipping the view distance effectively reduces the number of vertices, so it has a big effect.

The recurring factor for me is that for greatest speed, you need to cull vertices as efficiently as possible.

I would expect the quadtree to shine where for some reason, it is difficult to cull vertices. A good example might be a Minecraft clone, where you need to add or remove blocks dynamically. That means you can’t cull hidden faces permanently when you add blocks, because if any of the blocks are removed, there will be a hole. The culling needs to happen dynamically every frame.