welcome back! This week, I’m going to write about our decision to go to three dimensions, and how we did that technically.
As you may remember from my last post, after a few weeks we actually had a fully working 2D game, that was ready to ship, modulus artwork. We then decided to pimp the look of the game, by bringing Arash on the team. At this point, we made the decision to 3D, but it involved a lot of thinking. Some of the core arguments:
- We were 3 people on the team now, all of them having a degree in 3D Computer Graphics
- The artist was significantly more experienced in creating 3D artwork than creating 2D artwork
- At that time, there weren’t any 3D RTS games on the AppStore
- Last, but not least, we love 3D, that’s why we studied it. Since TowerMadness was a fun project, we figured we wanna pick what is most fun to us.
There is one key problem with the third argument, being the only 3D RTS game on the AppStore. That is, we got beaten to the market by Star Defense. Another Tower Defense game. In 3D. Featured at a Steve Jobs keynote. And they were using spherical maps. You can imagine we were pretty shattered for a short bit. But we didn’t give up, we took TowerMadness and made it more beautiful, faster, and more fun. And now, almost two years later, we’re still here and kickin’. Seems like our passion helped us over that hurdle. Our lesson: don’t give up, even if (for the moment) it looks like you just got beaten in every possible regard.
Anyways, back to 3D engines.
3D Engine Overview
I’m reluctant to even call what we have in TowerMadness an “engine. What it really is: a very pragmatic collection of utilities for and a wrapper around OpenGL ES1. The features:
- It caches the OpenGL states, and allows data-driven modification of these through custom material files, like (yes, we love JSON 🙂
“color”: [1.0, 1.0, 1.0, 1.0],
- It can load .pvr textures (by the way, check out my little PVR Quicklook plugin: https://github.com/Volcore/quickpvr)
- It loads a .texture file, which has additional information about textures, such as filter modes, like
- It loads .vbo files, which are nothing more than externally pre-compiled vertex and index buffer objects that get loaded straight onto the GPU. We wrote a little C++ tool that loads the .obj and outputs the .vbo file
- It loads .model files, which combine all of the above, linking a material with a vbo file, like:
- It supports a point sprite cache, where you can queue up point sprites with certain materials, locations, sizes, and they get all rendered in one batch per material later
- It doesn’t free anything. Since the RAM and GPU footprint of TowerMadness is really small, we get away with that
- Everything is handle based. This turned out to be great. If a model couldn’t be loaded, or was forgotten to be loaded, it didn’t crash the game. Rather, it always showed a little colored cube. In one instance, we forgot to add a .model and .vbo file to xcode, and the game shipped without the flamethrower level 3 model. Instead, it would show the “flaming cube of death” (see below), as it’s fans called it. Imagine it had crashed the game instead.
And that’s it, our “Engine”. In a nutshell, to the game code it’s not more than PGL_loadModel and PGL_drawModel, with a lot of OpenGL transforming and matrix pushing. But that’s all it needs. There is no in-engine culling (although we experimented with that, more on this later), no post processing, no lighting, no multi-texturing, no particle systems.
And that’s a little bit odd. When thinking about 3D engines, it always seems this magical thing that has so many fascinating features. But it often clutters the most important feature, which is to place a certain model at a certain position. And that’s what you need most of the time, for a regular game. As such, it should be optimized to make it very easy to do that.
To make it easy to place objects into the game, having a good art pipeline is extremely important. This includes every step the artist needs to do in order to put his new model into the game. This is so important, because the longer the pipeline, the longer iteration times.
As such, ideally, you’d have something where a plugin in the DCC tool exports the object right into the game. But since we’re indies, and we normally don’t have time or money to develop these tools ourselves, we need to improvise. And since we’re agile, we start out somewhere, and then iterate with artist feedback until it’s good enough.
Initially in our engine, the artists had to export their models as obj, then convert it to .vbo, create a .model, convert the texture to .pvr, create a .texture, create a .material file, then add everything to xcode and add the PGL_loadModel/PGL_drawModel commands. That’s a lot of steps, especially considering that most of them will be identical. That’s what we realized (way too late in the development, though), and optimized this a little. The .model, .texture and .material files looked the same for most of the objects: solid, trilinearly textured models, and the filename of the texture and vbo are the same, mod the file extension. Hence, we made those files optional, and saved the artist a lot of work, while also avoiding a lot of very small files.
I’m a performance fetishist. At least to some degree. I can’t stand if a game doesn’t run at 30/60 hz (depending on the genre). I will push the team for days to find the cause of stuttering, slow downs, etc.
But I’m also refraining from low level optimization as much as possible.
I believe it’s much more important to properly bound and control the amount of stuff rendered, than to get the last 5% out of the rendering code, to be able to render 5% more stuff on screen, especially if it comes at the cost of ease to use and maintainability.
This is what went terribly wrong in TowerMadness. Not only were the new levels constantly pushing the size, number of spawn points (and hence number of aliens), build-able spaces and doodads on the screen, and not only was the art constantly getting a few more triangles here, a few more triangles there. We also defied all reason, and added an endless mode to the game. Before the endless mode, every level was limited in time and complexity. Because we knew how much money you were getting in total, we knew how many towers you would be able to build at most. And we knew how many aliens were there at most, because even if you slowed all of them, there was a finite and usually small number. But we (and the fans) wanted more. More towers, more aliens, more everything. So we added that endless mode, and suddenly there was no upper bound anymore. The players could send wave over wave, slow them down, populate large maps with many many towers, until everything was maxed with (rendering and cost) expensive nuke towers. As such, we’re regularly getting complaints about the game becoming unplayably slow after some absurdly high wave that the game was never designed for, in endless mode. To date I’m not sure if adding the endless mode was a good idea.
In general though, for TowerMadness the main performance problem in rendering is that we have a lot of stuff on the screen. There are many doodads (trees, barn, sheep, etc), aliens and towers. Just to give you some estimated numbers, I think there are about 40-50 tree groves, 1-50 towers, 0-400 aliens and 10-20 doodads on the screen. And the opportunity for batching is very limited. However, I think the most important reason for why the game runs as fast as it does is that we’re doing a lot of “smart” batching. Eg. when rendering the towers, we first render all the tower bases, then render all the towers of one type, towers of the other type, etc. That way we minimize the state changes, but without actually performing any form of sorting on the engine side. It’s all pre-sorted.
We actually spent quite a while on optimizing. One other approach was to add frustum culling to every PGL_drawModel call. While it worked and culled a lot of stuff, the problem was that the generic culling was about as expensive as many of the rendering calls, which gave us no speedup at all. In worst case (fully zoomed out), the game was running at about 50% of the speed of without culling.
Then we did a “smarter” culling approach, where we just cull certain things, like towers, which have several draw calls. This actually gave a nice speedup.
The main problem, though, were the trees. Here we tried several things:
- Just render each tree grove separately
- Same as above, but with frustum culling
- Put all groves into one Vertex Array (VA) and render that (preprocessed, avoids state changes)
- Put all groves into one VBO and render that (preprocessed, avoids state changes)
- Put all groves into one VBO with dynamic culling (on the fly)
I sadly don’t have the numbers anymore, and they’d probably be outdated by now, because it was optimized on my old iTouch2G, with an MBX. There is one curious result though. On the MBX, using a VA was as fast as using a VBO, even if uploaded as a preprocessing step and then not changed in any frame. Apparently, the VBOs were _retransmitted every frame_ on the old MBX hardware! This is (luckily) no longer true for the SGX.
Eventually, we ended up using the second technique, rendering every grove separately and using a simple form of frustum culling.
3D Engine: The Bad
We have a very curious thing that comes up every once in a while. Some players, and reviewers, complain about how we’re not really using the 3D engine to do crazy things. Apparently, having a 3D engine means to them that it has to be used. However, I beg to differ. Eventually, I think it didn’t do us much harm.
The other problems with 3D is that it’s most likely going to be slower and more work than making a 2D game, so I guess you need to know what you’re doing and you should like to suffer when you do this.
3D Engine: The Good
One thing I haven’t mentioned yet is that since we made the game in 3D, it scales up very well to different display sizes. Going to the iPad and later Retina display, we had very little work to do. As far as I remember, it was just UI stuff that needed to be scaled up.
Other than that, we just love 3D, so that’s why we did it.
Would I do it again
Yes. I think it turned out really well, and matched perfectly to the skills of the team. And remember that this was designed for the iPhone hardware of the first generation. With the A5 out now, this is going to be a wild place. We’ve done some very cool stuff in our upcoming titles, but I can’t talk about that. Not today 🙂
Next time I’ll be writing about my little AppEngine-bag-of-tricks and how I got the TowerMadness server to scale to millions of requests per day.