These are confusing. My best understanding, please correct:
The identityMatrix is basically “no Matrix”. When you ask for the matrix of anything that hasn’t had any transforms applied to it, you get the identityMatrix.
The modelMatrix is the current global transforms applied to the coordinate system before drawing. So when you are positioning a 3D object using transforms, rather than directly placing it via x, y, and z, and manually-calculated rotation, and etc., before you reset everything via popMatrix, asking for the modelMatrix will return a matrix that combines all of the other transforms. Then after you popMatrix to rest everything, asking for the modelMatrix will return the identityMatrix.
The viewMatrix is a combination of all transforms applied to the camera. So when you set the camera’s position, and where it’s aiming, etc, you’re really applying those as transforms to the viewMatrix. And asking for the viewMatrix will get the result for you.
The projectionMatrix is the hardest for me to understand and describe. It’s like there’s a small hole in a wall. If you stand back from the hole, you can only see what’s directly across from you. For instance, maybe there’s a rabbit on a hill and the hole is perfectly placed so that if you’re standing five feet back from the wall you can perfectly see the rabbit, framed by the hole. But you can’t see the hill or much of anything else. Then as you step closer to the hole, you can see more of the hill and the sky and everything around the rabbit. And with your eye right up against the hole, you see the closest to what you’d see if there was no wall at all. Now if one imagines that the hole is actually a video screen that’s set up to try to fool you into thinking it’s an oddly-rectangular hole in the wall, and that it updates its display based on how close you are to it, it can accomplish all this with different states of the projectionMatrix.
That analogy may make it seem like the projectionMatrix is identical to the viewMatrix, because they’re both about distance and the amount of information onscreen. And they are very similar, but the view is more about the placement of the hole relative to the landscape or world, and the projection is about the placement of your eye relative to the hole.
So an example would be if the other side of the wall was a model city, tipped on its side to face the hole, so that we seem to be looking down on the city. Now, first, let’s start from standing back five feet from the wall again. If we’re standing back from the hole, what do we see if we move the model closer and farther from the wall? We’re stationary but far from the hole, but the model is moving closer and farther from the hole. What we see through the hole is very similar to what we’d see if the model wasn’t a model at all, but a top-down drawing of the city, like a map view. In other words, we see the very tops of the buildings, getting closer and farther from the hole, but we never see any of the sides of the buildings–or very little, at least.
If it was a drawn map, that’s because the information about the sides of the buildings simply isn’t there, but if it was a model, that’s because we’re standing so far away. To illustrate, imagine the model (and alternately the map) are placed very close to the hole, showing (let’s say) a three-by-three grid of skyscrapers, and we start walking towards the hole. As we get closer and closer, in the scenario with the drawn map, we mostly see the same information, but closer. We’ll see a bit more of the houses on the periphery of the hole, as we get right up close, but the information we get about them won’t ever change, its still just the top-down drawing. But as we get closer and closer to the model city, we also start to see some of the buildings on the periphery, but we see significantly different information about them. We don’t see them as if our eye is diagonal to a top-down drawing, we see them as if our eye is diagonal to a 3D building, in other words, we now see some of the sides of the buildings, we get that fish-eye effect, where the objects in the center of our view look like they’re pointing straight at us, and the objects on the periphery of our view seem like they’re bending away from us.
The last step is again imagining that all this is happening on a digital screen that is trying to fool our eye into thinking it’s a hole in the wall. Assuming that the screen is using a 3D scene of a city, the screen could accomplish both the flat map effect and the model city effect merely by changing the projectionMatrix. I don’t know what the settings would actually be, but with one setting you’d see the 3D model as if it were a flat overhead drawing, and with another setting you’d see it as if it were an actual model city.
Whew–that’s my best guess as to the difference between these things. Corrections please!