App Size

Hey iDevBlogADay,
I’ll only write a short post, as we’re in the final crunch for our next game. I’m super excited about it, and I’ll share more in the following weeks.
This week I’m just going to cover one aspect that is of particular importance to me and Limbic, and I think it’s one of the thing that many devs ignore: Reasons for keeping your app size small, and how to achieve that. Obviously, this doesn’t apply to every app (I couldn’t imagine Rage being a 20 MiB app), but especially for us Indies, it has some important arguments we should consider.
In our upcoming game, our artists did an incredible job at optimizing the assets. We’re right now at around an incredible 8 MiB, and it’s visually stunning (you’ll see :-).
Why keep the App Size small?
I’ll just do a classical bullet list collection of arguments:
  • First of all, there is the over-the-air download limit of 20 MiB. That means any app larger than 20 MiB can not be downloaded via 3G, and must be downloaded on WiFi only. Obviously, this does affect sales to a degree. I’ve heard some numbers floating around of about 15% sales drop if you go over the 20 MiB limit, so that’s something to very carefully consider.
  • There is a (somewhat complex) relationship between app size, loading time, and memory footprint of the app. And I have a fetish for fast loading times. Having a small memory footprint makes a lot of things easier for development, too (as you can basically ignore the memory warnings, that’s what we’re doing in TowerMadness and it works great).
  • The larger the app, the more likely I’m to delete it when I’m hitting the limit again (I only have a 16 GiB device, like most). As such, keeping the app size small means that people are more likely to keep it.
  • With a small app size you can ensure you’re not going over the 20MiB limit even after many updates. TM went from having 4 maps to more than 60 maps, and it’s still below the 20 MiB.
Obviously, there are also many drawbacks, such as potentially prolonged development time, limiting quality, preventing you to use certain features ,such as video. I remember one app that was very simple, and could probably fit in about 15 MiB. But they shipped with almost 200 MiB of video, which was shown once.
What can be done to optimize the app size?
For us, the classical items that quickly add a lot of size to an application bundle are:
  1. Videos
  2. Audio
  3. Code Bloat
  4. Textures/Images and other GPU data
I can’t really say that much about 1), as we haven’t included any videos in our apps yet. I really don’t know a lot about 2), so I don’t know how much you can gain by choosing different codecs, etc. We’re mostly using .caf files, but that’s probably not very optimal and all I know.
I can say a lot more about 3) and 4) though. Although it may sound a little strange, code size can very quickly blow up. I’ve done a little survey, and I’ve found binary sizes ranging between 1MiB and 10Mib, for apps of approximately equal complexity. This can mean that you can double the amount or quality of the assets, and still be in the 20MiB limit. One thing that can cause code bloat are third party libraries. I’ve seen anything from +10KiB executable size for a library, up to almost a megabyte, where you wouldn’t expect it. And often, it doesn’t even correlate to the complexity of the libraries functionality. So, an important hint here is to watch out for the executable size when you add libraries.
Another important factor is language-related code bloat. Especially if you use C++ with a lot of templates, this can happen pretty quickly. I haven’t tried this on iOS, but I regularly had issues with Boost blowing up the binary size significantly. The problem with templates in C++ is that, if not written carefully, they can produce a lot of duplicate code, that essentially does the same thing, but for a different type. So, a second hint here would be to watch out these effects, and factor in exectuable size into consideration.
Multi-objective optimization
The fourth item on the list is a rather complex topic. On the one hand, you want your assets to be as high quality as possible. On the other hand, you want a low loading time, and small app size.
So you have three different objectives, that affect each other. For example, a common way to reduce file size is to use PNG or JPEG for the compression of texture data. While this has a huge effect on file size, it also negatively affects loading time. The PNG will be loaded to memory, then cpu-intensively decompressed into a larger buffer, and then uploaded to the GPU. It can also have grave implications for your memory footprint. While your PNGs are small, once they’re decompressed, they’re usually in memory as RGB or RGBA, using up to 32 bits per pixel. Just a single 512×512 texture already takes up 1MiB in memory (and on the GPU) then. To counteract that, you can do some real-time encoding to a different format (such as RGB565), but this takes up even more CPU time. So, here you have a difficult balance between file size, cpu time and memory footprint, which can quickly tip over to one side or another. Especially if you’re not paying attention, and you’re just adding the PNGs the artists hands you.
Our solution
So, here is what we do to tackle this problem at Limbic:
First of all, to minimze loading time, we’re preprocessing everything to a GPU format. That means our meshes are in the exact format that the VBO data we upload to the GPU. For textures, we use the .pvr format. That means our textures themselves are not compressed (unless we use PVRTC), but they are only encoded with the GPU ready pixel format. This gives the artists a direct feedback on the final size of the texture and the strain of the GPU. We then put all the files into a custom package format, that’s similar to Zip with a few performance enhancements. This approach gives us a really small load time, and the memory footprint is about equal to the uncompressed size on disk.
Another important element of the size optimization is a demanding producer, that constantly and from day one keeps a watch on the file sizes and the game performance, and lets the responsible people know if something is getting out of hand.
Conclusion
All the hassle finally pays off for us. For our latest game, that means gorgeous visuals, 8 MiB file size, 4s unoptimized initial loading time and no further delays, for a full 3d game with a completely 3d menu. It’s a constant struggle to get there, but I’m looking forward to it for the next project already.
Back to crunch mode. 🙂
Update:
Bob Koon sent me a link to his page: http://www.appsizematters.com/ Lots of great information there!

TowerMadness Retrospective Part 3

Hey everyone,
as promised last time, I’ll write a little bit about the server side of TowerMadness, and the design decisions we faced. This will be very non-technical.
Preface
I should probably build a little bit of credibility before I start to write about all this server stuff. I’ve been working with web technology forever. I love the technical side of it. My main achievement in this area prior to the TowerMadness server infrastructure is a little project called Waaagh!TV, which I’ve created in 2004, and it has since streamed tens of thousands of Warcraft 3 games in real-time to around a million of viewer in total.
That said, TowerMadness was a special challenge. At the time, there weren’t any proven frameworks, like OpenFeint or Plus+ yet, so we had no other choice than to do it ourselves (although we probably would’ve done it ourselves anyways). We had a lot of special requirements, too. Competitiveness was very important to us, and it’s one of the main selling factors of TowerMadness. Replays for every game are analyzed and stored online for other players to watch. We also had DLC maps long before IAPs, although we’re now back to just updating the game, which is also an effective marketing tool. We had real-world price competitions, but after they generated little publicity we stopped doing them.
The other factor was that we had no idea how well TowerMadness would sell. At that time, there were games that sold 250k+ copies, which seemed like an unreachable number to us. Because of our focus on competitiveness, we asked ourselves an important question: What if it is successful and it gets high up in the ranks? It would be terrible if at that short period of time where we have this huge amount of exposure, the server system can’t handle the load and hence severely taints the user experience.
As you can see, we had a lot of things planned, and fooled around a lot. As such, we wanted a flexible and scalable online system.
TowerMadness Server System
We initially started out with a Dreamhost hosted php+mysql server in early beta. It was very cheap, but that’s it. It had many downtimes, high latency, and it seemed impossible to scale, so we gave it a no go.
We then investigated many of the upcoming very popular cloud server systems, EC2, AppEngine, Rackspace. We picked AppEngine mostly because of the ease of use, the very lovely free tier, and the fine-grained scaling (money and requests wise), all without manual intervention as would be required with EC2 and Rackspace at the time. Seemed like a perfect solution for a startup that doesn’t know yet what it’ll score.
Long story short, AppEngine has proven to be a great decision. We’ve scaled up from only a few requests a minute in beta to 30 requests per second, which is millions of requests a day. Once in a while I needed to improve the scalability of the code, which was mostly due to not knowing how the next order of magnitude of requests will pressure the datastore and CPU, and hence making uneducated mistakes. At no time, however, was our service seriously affected by any prolonged service interruptions (apart from occasional and very rare AppEngine maintenances).
Word of caution
AppEngine is not a classical LAMP system, with long request times and a full-featured MySQL database. The datastore is very scalability oriented, and it is really not much more than a hash-map that maps keys to automatically serialized objects. The query functionality is very limited.
In my eyes, it simply tries to prevent you from doing any unscalable things (although you can still do those :-). This is the reason why the backing database is very limited. Personally I learned a lot about scaling into the million requests a day realm by understanding how AppEngine works. There is a lot of interesting things I learned that I should probably share on this blog.
Nevertheless, as a summary I should say that I love AppEngine for what it is, a cheap, efficient and very scalable solution for many server problems. It was the perfect choice for TowerMadness. But it’s not for everyone, and not for every problem. I think the goal to a successful Web infrastructure is to pick whatever gets the job done best, and it takes a lot of time and experience to find a good solution. We’re by now running several EC2 nodes, many AppEngine apps, and a few dedicated servers, and we’re still learning our ways through this.
Game Accounts
Another very interesting problem that is related to the server question is that of user authentication. As a game developer, you want an easy to use and reliable way of uniquely identifying your users. This enables a lot of interesting features, like online high scores, cloud storage of user progress and settings, sharing friend scores in social networks. However, this can turn out to be pretty tricky.
In TowerMadness, we used a user-entered username plus the device UDID to uniquely identify users. We also wanted to support multiple accounts per device, a decision that made our life harder, while it added little to the game, as only a very very small fraction of the users use this feature. We then later on added GameCenter. This caused a lot of trouble and still creates issues now and then.
All in all, I’m not so happy with how the whole situation turned out. The solution we have in TowerMadness is far from ideal, and I would definitely change it if I would do it again.
However, the whole user authentication is still an unresolved problem, so here is a quick overview of the most popular techniques that I’m aware of:
Game Center:
Apple’s built in iOS social network for games.
+ uniquely identifies a user
+ built-in authentication
+ provides a social network of friends
– GameCenter adoption rate not great (probably because the signup and authentication is not great yet)
– GameCenter API has issues
– GameCenter does not provide secure authentication for a third-party server (such as an OAuth token, so storing private information is a bad idea. Your user id is public, so someone can easily mess with your data.
– locked into the iOS platform (hopefully MacOS soon)
UDID:
Using the Device’s unique ID to identify a user.
+ no signup, no question to the user, very lightweight
– no secure authentication
– uniquely identifies device, not user (can’t easily transfer to new device)
Email+password system:
Asking for the user email and securing it with a password
+ uniquely identifies users across devices
+ can be combined with a newsletter signup
+ people don’t forget their email as easily as a random user name
– people need to sign up with their emails, which can be akward
– you need to manage password recovery, password changing, etc
Facebook:
Using the facebook ID of the user and valid authentication tokens to uniquely identify the user and integrate into the facebook platform.
+ tight viral marketing integration
+ secure authentication via OAuth
+ Everyone has an account
+ people don’t forget the login, you don’t need to take care of account management
– some people are allergic to facebook logins/signups
Twitter:
Same as facebook, except:
+ in my experience fewer allergies to typing in twitter credentials (probably because less private information on twitter)
– fewer users, and hence less viral potential
Third Party:
Using a third party service like OpenFeint, AGON, Plus+, Scoreloop, Cloudcell.
+ usually easy to integrate and provides a lot of convenience (cloud storage, etc)
+ no worries about server side
+ usually secure authentication
– give control out of your hands. the more important the online business is for your app, the less attractive this is
– can go out of business (see AGON shutting down http://developer.agon-online.com/blog/)
– may not provide a social network/friend graph
– high fragmentation between OpenFeint, GameCenter, Plus+, Agon, etc.
– usually add a large footprint to your binary
As you can see from my unscientific and biased presentation, there is still no perfect solution for the online account system. However, my hopes are still for a better GameCenter API with secure server authentication, better user experience and MacOS support.
I really hope that one of the frameworks will eventually become ubiquitous, so that integrating social features (such as friend scores) is more effective, as it’s something really fun. I love comparing my scores to my friends, it’s so much more personal than fighting for rank #342523 against “MikeDevil147”.
But for now, it seems that either going with a third party framework, or rolling your own thing with multiple different authentications (Facebook, GameCenter, maybe Email+pass) seem to be the best solutions for a game like TowerMadness.
Conclusion
A scalable server side can be a very difficult problem and should never be underestimated. User authentication is also something that still needs to be solved in the long run.
As usual, let me know in the comments if you have any specific questions, I’d love to answer them.

TowerMadness Retrospective Part 2

Hey everyone,
welcome back! This week, I’m going to write about our decision to go to three dimensions, and how we did that technically.
Why 3D?
As you may remember from my last post, after a few weeks we actually had a fully working 2D game, that was ready to ship, modulus artwork. We then decided to pimp the look of the game, by bringing Arash on the team. At this point, we made the decision to 3D, but it involved a lot of thinking. Some of the core arguments:
  • We were 3 people on the team now, all of them having a degree in 3D Computer Graphics
  • The artist was significantly more experienced in creating 3D artwork than creating 2D artwork
  • At that time, there weren’t any 3D RTS games on the AppStore
  • Last, but not least, we love 3D, that’s why we studied it. Since TowerMadness was a fun project, we figured we wanna pick what is most fun to us.
There is one key problem with the third argument, being the only 3D RTS game on the AppStore. That is, we got beaten to the market by Star Defense. Another Tower Defense game. In 3D. Featured at a Steve Jobs keynote. And they were using spherical maps. You can imagine we were pretty shattered for a short bit. But we didn’t give up, we took TowerMadness and made it more beautiful, faster, and more fun. And now, almost two years later, we’re still here and kickin’. Seems like our passion helped us over that hurdle. Our lesson: don’t give up, even if (for the moment) it looks like you just got beaten in every possible regard.
Anyways, back to 3D engines.
3D Engine Overview
I’m reluctant to even call what we have in TowerMadness an “engine. What it really is: a very pragmatic collection of utilities for and a wrapper around OpenGL ES1. The features:
  • It caches the OpenGL states, and allows data-driven modification of these through custom material files, like (yes, we love JSON 🙂
“lightning”: {
“tex”: “lightning”,
“depthmask”: 0,
“color”: [1.0, 1.0, 1.0, 1.0],
“cull”: “none”,
“blend”: “alpha”
}
  • It can load .pvr textures (by the way, check out my little PVR Quicklook plugin: https://github.com/Volcore/quickpvr)
  • It loads a .texture file, which has additional information about textures, such as filter modes, like
{
“type”: “pvr”,
“file”: “landmineA”,
“mag”: “linear”,
“min”: “linear_mipmap_linear”
}
  • It loads .vbo files, which are nothing more than externally pre-compiled vertex and index buffer objects that get loaded straight onto the GPU. We wrote a little C++ tool that loads the .obj and outputs the .vbo file
  • It loads .model files, which combine all of the above, linking a material with a vbo file, like:
“landmineA”: {
“vbo”: “landmineA”,
“material”: “landmineA”
}
  • It supports a point sprite cache, where you can queue up point sprites with certain materials, locations, sizes, and they get all rendered in one batch per material later
  • It doesn’t free anything. Since the RAM and GPU footprint of TowerMadness is really small, we get away with that
  • Everything is handle based. This turned out to be great. If a model couldn’t be loaded, or was forgotten to be loaded, it didn’t crash the game. Rather, it always showed a little colored cube. In one instance, we forgot to add a .model and .vbo file to xcode, and the game shipped without the flamethrower level 3 model. Instead, it would show the “flaming cube of death” (see below), as it’s fans called it. Imagine it had crashed the game instead.

And that’s it, our “Engine”. In a nutshell, to the game code it’s not more than PGL_loadModel and PGL_drawModel, with a lot of OpenGL transforming and matrix pushing. But that’s all it needs. There is no in-engine culling (although we experimented with that, more on this later), no post processing, no lighting, no multi-texturing, no particle systems.

And that’s a little bit odd. When thinking about 3D engines, it always seems this magical thing that has so many fascinating features. But it often clutters the most important feature, which is to place a certain model at a certain position. And that’s what you need most of the time, for a regular game. As such, it should be optimized to make it very easy to do that.
Pipeline
To make it easy to place objects into the game, having a good art pipeline is extremely important. This includes every step the artist needs to do in order to put his new model into the game. This is so important, because the longer the pipeline, the longer iteration times.
As such, ideally, you’d have something where a plugin in the DCC tool exports the object right into the game. But since we’re indies, and we normally don’t have time or money to develop these tools ourselves, we need to improvise. And since we’re agile, we start out somewhere, and then iterate with artist feedback until it’s good enough.
Initially in our engine, the artists had to export their models as obj, then convert it to .vbo, create a .model, convert the texture to .pvr, create a .texture, create a .material file, then add everything to xcode and add the PGL_loadModel/PGL_drawModel commands. That’s a lot of steps, especially considering that most of them will be identical. That’s what we realized (way too late in the development, though), and optimized this a little. The .model, .texture and .material files looked the same for most of the objects: solid, trilinearly textured models, and the filename of the texture and vbo are the same, mod the file extension. Hence, we made those files optional, and saved the artist a lot of work, while also avoiding a lot of very small files.
Performance
I’m a performance fetishist. At least to some degree. I can’t stand if a game doesn’t run at 30/60 hz (depending on the genre). I will push the team for days to find the cause of stuttering, slow downs, etc.
But I’m also refraining from low level optimization as much as possible.
I believe it’s much more important to properly bound and control the amount of stuff rendered, than to get the last 5% out of the rendering code, to be able to render 5% more stuff on screen, especially if it comes at the cost of ease to use and maintainability.
This is what went terribly wrong in TowerMadness. Not only were the new levels constantly pushing the size, number of spawn points (and hence number of aliens), build-able spaces and doodads on the screen, and not only was the art constantly getting a few more triangles here, a few more triangles there. We also defied all reason, and added an endless mode to the game. Before the endless mode, every level was limited in time and complexity. Because we knew how much money you were getting in total, we knew how many towers you would be able to build at most. And we knew how many aliens were there at most, because even if you slowed all of them, there was a finite and usually small number. But we (and the fans) wanted more. More towers, more aliens, more everything. So we added that endless mode, and suddenly there was no upper bound anymore. The players could send wave over wave, slow them down, populate large maps with many many towers, until everything was maxed with (rendering and cost) expensive nuke towers. As such, we’re regularly getting complaints about the game becoming unplayably slow after some absurdly high wave that the game was never designed for, in endless mode. To date I’m not sure if adding the endless mode was a good idea.
In general though, for TowerMadness the main performance problem in rendering is that we have a lot of stuff on the screen. There are many doodads (trees, barn, sheep, etc), aliens and towers. Just to give you some estimated numbers, I think there are about 40-50 tree groves, 1-50 towers, 0-400 aliens and 10-20 doodads on the screen. And the opportunity for batching is very limited. However, I think the most important reason for why the game runs as fast as it does is that we’re doing a lot of “smart” batching. Eg. when rendering the towers, we first render all the tower bases, then render all the towers of one type, towers of the other type, etc. That way we minimize the state changes, but without actually performing any form of sorting on the engine side. It’s all pre-sorted.
We actually spent quite a while on optimizing. One other approach was to add frustum culling to every PGL_drawModel call. While it worked and culled a lot of stuff, the problem was that the generic culling was about as expensive as many of the rendering calls, which gave us no speedup at all. In worst case (fully zoomed out), the game was running at about 50% of the speed of without culling.
Then we did a “smarter” culling approach, where we just cull certain things, like towers, which have several draw calls. This actually gave a nice speedup.
The main problem, though, were the trees. Here we tried several things:
  1. Just render each tree grove separately
  2. Same as above, but with frustum culling
  3. Put all groves into one Vertex Array (VA) and render that (preprocessed, avoids state changes)
  4. Put all groves into one VBO and render that (preprocessed, avoids state changes)
  5. Put all groves into one VBO with dynamic culling (on the fly)
I sadly don’t have the numbers anymore, and they’d probably be outdated by now, because it was optimized on my old iTouch2G, with an MBX. There is one curious result though. On the MBX, using a VA was as fast as using a VBO, even if uploaded as a preprocessing step and then not changed in any frame. Apparently, the VBOs were _retransmitted every frame_ on the old MBX hardware! This is (luckily) no longer true for the SGX.
Eventually, we ended up using the second technique, rendering every grove separately and using a simple form of frustum culling.
3D Engine: The Bad

We have a very curious thing that comes up every once in a while. Some players, and reviewers, complain about how we’re not really using the 3D engine to do crazy things. Apparently, having a 3D engine means to them that it has to be used. However, I beg to differ. Eventually, I think it didn’t do us much harm.
The other problems with 3D is that it’s most likely going to be slower and more work than making a 2D game, so I guess you need to know what you’re doing and you should like to suffer when you do this.
3D Engine: The Good

One thing I haven’t mentioned yet is that since we made the game in 3D, it scales up very well to different display sizes. Going to the iPad and later Retina display, we had very little work to do. As far as I remember, it was just UI stuff that needed to be scaled up.
Other than that, we just love 3D, so that’s why we did it.
Would I do it again
Yes. I think it turned out really well, and matched perfectly to the skills of the team. And remember that this was designed for the iPhone hardware of the first generation. With the A5 out now, this is going to be a wild place. We’ve done some very cool stuff in our upcoming titles, but I can’t talk about that. Not today 🙂
Next time I’ll be writing about my little AppEngine-bag-of-tricks and how I got the TowerMadness server to scale to millions of requests per day.

TowerMadness Retrospective Part 1

[Note: Blogger owned me a little bit when I tried to put the formated post in here, so the version that went up on iDevBlogADay had some issues. They’ve since been edited]
I’m excited to finally be able to contribute to this collective of great iPhone indie wisdom! Over the next weeks, I’ll talk about some technical decisions and features of our first game, TowerMadness, and how they turned out in the end. While this post is meant to give others with similar design decisions more input, it also helps me to realize what was and is going on.
But first…
…let me give you an idea of who I am, and what qualifies my to write on the #iDevBlogADay list. I’m Volker, co-founder, Shogun of Technology, Emperor of Web Development and King of Game Design at Limbic Software. In short, I’m the one to blame if any of our apps doesn’t perform well, either on the performance, stability or gameplay side. I met the other two co-founders while doing my master’s at UCSD (for which, after 2 years, I finally submitted a paper, yay!), where the company was born in early 2009. After a very short PhD period in late 2009 in Aachen, I’m also proud to be able to call myself a college dropout. It was at the time when TowerMadness was starting to get really successful, and I had the chance to become a full-time indie gamedev. Now, in 2011, we’ve been to #1 on the App Store, earned many awards, and we’ve got more than 5 million downloads. What a crazy journey.
TowerMadness
Our first game, TowerMadness, was initially really just a little fun project, thought up by Iman and me on surfboards on the La Jolla beach. Because of time constraints (60h/week M.S. thesis), we were mostly working on it on the weekends, after surfing. Inspired by countless Flash and Warcraft 3 Tower Defenses we had played, and a passionate loved for them, after only a few weeks, we actually had an almost complete game.

The iTD Prototype


We lovingly called it iTD, and it was a lot of fun. But the problem was that neither Iman nor me were artists at all (I gloriously painted all the tower icons up there in Gimp, the sheep too). Combine that with the fact that Fieldrunners just came out and was taking the Appstore by storm. And although we were up to par in terms of gameplay, our graphics was atrocious. So we got Arash on board, who is not only a great programmer with strong 3D background, but also a great 3D artist, and founded the company and started from scratch.

TowerMadness as you can find it on the AppStore


This was also the point in time were a lot of important design decisions were made.
  • The original iTD prototype was written in almost 100% Objective C. We tried our best at proper OOP, every little dude had it’s own class. And it was slow. Because we didn’t know the platform very well and wanted to play it safe we decided to go with a minimalistic approach and use C99 to develop TowerMadness. It turned out to work really well, as the performance is great and we never had any issues with crash bugs, even in early alpha stage. Of course, it may have turned out the same with any other programming language, but the end result is proof that C99 worked out just well for us.
  • Back in iTD you were shooting the sheep. However after an intervention by my girlfriend we decided that we don’t want to hurt animals, and started shooting at aliens.
  • Because Arash, Iman and I are all 3D Computer Graphics Masters or PhDs candidates, we decided that making the game 3D would be worth a shot, especially since there hadn’t been any 3D Tower Defense games on the AppStore. I’ll talk about the 3D stuff in more detail in the coming weeks.
  • We decided to launch with only a few maps. The first alpha had 1 map, 3 towers, beta up to 3 maps, 9 towers, the release then had 4 maps and 9 towers. The game now, after almost two years of updating, has something around 60 maps and 12 towers.
  • We started out with a Dreamhost’d php script for the server side, but before launch we switched to AppEngine — a great move, considering we’re getting millions of requests a day now.
In retrospective, we used a very pragmatic and iterative development strategy, which probably was the reason it actually ever finished. At the time we were all still doing something else fulltime.
In the rest of this post, I’ll talk about a special little feature that so far I believe no other RTS game in the AppStore has, the replays.
Replays
Replays are a very interesting aspect of an RTS game. You couldn’t imagine Starcraft 2 or Warcraft 3 without the ability to watch replays of yourself or your favourite players.
The idea is that every game a player plays is recorded, and can be played back. You can then take that replay, and share it on the internet, analyze it with an automated tool, use it for debugging, and to run cheat-proof competitions.
Sadly this topic can get very boring and specific very quickly, so I’m trying to focus on the important design relevant parts and the pros and cons. If you like to know more about this topic, let me know in the comments.
How does it work?
How this works depends on your game. If you have a game with a few entities (eg. classical shooter like Quake), you can get away with just storing the positions of the players, items, bullets in snapshots, and then storing delta updates and important events in between. For an RTS game, like TowerMadness or even Starcraft, this doesn’t work because you have so many units and buildings. It would be an immense amount of data.
Instead, in an RTS game, you only record the game commands. For TowerMadness, that sounds pretty simple, since there are only 4 such commands: Build Tower, Upgrade Tower, Sell Tower, Start Game/Send Next Wave. When playing back a game, the game simulates each and every step that the original game did, and applies the commands exactly at the same times. This process is where it gets really tricky, as every computation needs to be exactly as it was when the original game was played. This synchronicity is one of the key challenges in implementing replays in an RTS game.
Keeping it Synchronous
On the one hand, if you follow a few simple rules, such as “the gamestate may never depend on external factors besides the input”, it seems pretty easy to keep a game in sync. On the other hand, the problem is that every tiny bug has the potential to break the replay feature.
When the synchronicity breaks, the result can be compared to a butterfly effect. What happens is that the game slowly starts to become “weird”. As an example, imagine your computation for which enemy to shoot at an external input (such as rand()). And when you play back a replay, instead of aiming at the little alien as in the original simulation, which gives money very quickly, the replayed game aims at a large alien. That way, the player doesn’t get the money in time to execute the next recorded build command, so the replayed game denies executing that command. And without that second tower, you’re in trouble, and the game will inevitably take a dramatically different and mostly fatal outcome.
To detect such issues instantly, every few frames, and every time we execute a command, we compute a checksum of the game state. This checksum is then verified when the replay is played back. The checksum doesn’t tell you what goes wrong, but it tells you _that_ something went wrong. And this is usually a great indicator for a bug somewhere in the code.
One method I use frequently to keep the replay feature sane is to run the replay at the same time you’re running the real game during development, in lockstep, and triggering a breakpoint if the game goes out of sync.
What can you do with it
There is the obvious feature of giving players the opportunity to watch their own, their friends’ and other peoples games to learn about other strategies. Some usually counter this by saying that the replay feature makes the game boring, because you know the “solution” to a level. While this is true to some degree, I don’t see why players shouldn’t be able to see the right solution if they want to. And the replay feature stirs the competition. We’ve had many people on our leader boards that push the strategies forward, beating each other by just a few points a time.
Another interesting feature is that you can analyze the uploaded replays on the server side, to learn about popular towers, easy and hard maps, and compute statistics. This is what we did for the first year-or-so. But after a while it became apparent that the sheer amount of replays submitted was too much even for my quad-core i7 running 8 instances of the stats bot at the same time. So now we actually compute the stats on the client side and submit them with the replay. We can then randomly and selectively verify if someone is cheating, especially if they have high scores on the leaderboards.
As we’re all gamers, and love competition (I’m totally into Starcraft 2 as you can see from my account) another interesting feature is running competitions. We did so, by releasing new maps for which the public replay option was disabled, and everyone could submit their replay for a week. After the week, whoever got the highest legit score won. With replays you can actually verify that every game submitted was legit. Whatever “cheating” you may do, it has to result in a game that is played back correctly.
One of TowerMadness’ key features is the competitive play. Because the leaderboards are the heart of the competitiveness, we regularly clean them from any cheating attempts using a little verification bot.
Replay Cons
Here is a summary of what is not so great about planning and implementing a replay feature:
  • Players can get easily confused if the replays are out of sync. In general, I would say that no replay feature is better than a broken replay feature.
  • It can be really difficult to get this feature right, especially regarding the non-reproducibility of IEEE754 floats, especially with math libraries and trig functions that have tiny differences on many platforms. I’ve managed to get our code working on iOS (all devices), MacOS 32 and 64, and Linux x32 (x64 had a few issues), but it was a long non-trivial process.
  • It took us about half a year post launch to iron out all the issues from the replay feature.
  • If you work on the game code with several people, it gets even harder to educated them about writing synchronous code. Here, it really helps to have MS/PhD colleagues though 🙂
Replay Pros
And here is the reasons why you would want it anyways:
  • Although the replay feature itself is somewhat hidden in TowerMadness (we could’ve done a better job at integrating it), it still has a profound impact on many aspects of the game and game design. Additionally, players that know about it love it.
  • The replay feature is a strong driver for the competitiveness of our players. We’ve got people that battle it out on our leaderboards for days.
  • Replays can be a great help balancing the games, by looking at how players play the game, and detect any abuses, too strong or too weak game mechanics.
  • Replays enable cheat-proof competitions. However, eventually these didn’t really matter, and didn’t drive the sales any more than the leaderboards already did, so we abandoned them.
Was it worth it?
Now, eventually it all comes down to these questions: Was the replay feature worth it’s time and effort, and would I implement it again?
I must confess I am a little bit biased here, because I’m also running a broadcasting service for Warcraft 3, Waaagh!TV, where we take the replay data and broadcast it to thousands of viewers in real-time. As such, I probably value this kind of technology a little higher than it is to the average person.
However, I do believe that the replay feature has an impact on the success of TowerMadness. It’s one of the most competitive Tower Defense games on the AppStore, and the competitiveness is amplified by giving the players more than just a score they have to beat.
Replays are also a valuable tool for the development process, finding bugs and assist the game design by helping you understand how players play your game.
As such, I would say that the feature was definitely worth it for us, considering our focus was on creating a competitive game.
Conclusion
Wow, that was a pretty long post. I hope it didn’t bore you guys too much, in case anyone actually even reads all the way down to here. Next week I will write another tech retrospective, probably about our 3D “engine”.

Starcraft 2 Model Format Pt. 2

Due to popular request, here is the second installment of my Starcraft 2 .m3 model format series.
First of all, I’ve gotten quite a few mails and comments on this topic. This includes some links to other people also working on the format, so here is a collection of links:
Thanks everyone for the links!
Today I’ll chat a little about the stored meshes. There are threed parts to rendering out simple meshes: the vertices, the faces, and the regions. But first, we need to find them.
That’s where the MODL chunk comes in. It should be considered the header of the m3 format. It defines where to find what. However, we’re only interested in a few parts. It’s important to note that I’ve found two different versions of the MODL header, version 0x14 and 0x17. 0x14 is slightly different, and I’m not going to talk about it in this post.
struct MODLHeader {
uint32 stuff_we_dont_need_atm[17]
uint32 flags
uint32 vertex_data_size
uint32 vertex_tag
uint32 div_count
uint32 div_tag
[… some more data we don’t care about…]
}
The precious parts here are the reference to the vertices, the reference to the div, and the flags. The flags will become handy when parsing the vertices, as they describe what can be found in the vertex format. In detail:
if flags & 0x40000 == 0:
vertex_size = 32
else:
vertex_size = 36
There are def. other information in the flags, but this is what we need right now. The vertices can now be found at the vertex tag, which is just a collection of bytes. The number of vertices is vertex_data_size / vertex_size. The format is as follows:
struct Vertex {
float32 position[3]
uint8 boneweights[4]
uint8 boneindices[4]
uint8 normal[4]
uint16 uv[2]
if vertex_size == 36: {
uint8 unknown[4]
}
uint8 tangent[4]
}
Position is just the object space position of the vertex.
Boneweights range from 0 to 255 and represent the weight factor (divided by the sum of all weights) for each bone matrix.
Boneindices are indices that point to the corresponding bone matrix.
The normal is compressed and can be extracted as c = 2*c/255.0-1 for each component.
The uvs are scaled by 2048, so they need to be divided by 2048 to be used in opengl.
Finally, the tangent is compressed in the same way as the normal.
To get the bitangent and set up a correctly oriented and orthonormal tangent space, we need to take the cross product of the normal and tangent, and then multiply it by the w component of the normal: (n.xyz cross t.xyz)*n.w
The DIV is really just a container for two other important chunks, the faces and the regions:
struct DIV {
uint32 indices_count
uint32 indices_tag
uint32 regions_count
uint32 regions_tag
[… some other stuff …]
}
The triangles are just stored as tripplets in the indices U16_ tag. There are indices_count/3 triangles in the mesh.
To correctly render the mesh, we also need the region chunk:
struct Region {
uint32 unknown — Note: updated, thanks to NiNtoxicated!
uint16 vertex_offset
uint16 vertex_count
uint32 index_offset
uint32 index_count
uint8 unknown[12]
}
Now, to render a mesh, we can iterate through all regions, and for each region, we render index_count/3 triangles with the indices from the indices chunk. This looks something like this:
def drawModel(self):
for region in div.regions:
GL.glBegin(GL_TRIANGLES)
for i in range(region.index_count):
v = vertices[div.indices[region.index_offset+i]]
GL.glTexCoord(v.uv[0]/2048.0, v.uv[1]/2048.0)
GL.glVertex(v.position[0], v.position[1], v.position[2])
GL.glEnd()
And that’s it! Setting up the proper textures and materials is rather complex and is def. worth another blog entry 😛 That’s it for now. As usual, let me know if you have any questions or comments!
PS: let me know if you know how to properly format source code on blogger 🙂

Starcraft 2 Model Format Pt. 1

Due to a number of requests I’ve received, I’ve decided to write down some of my findings in the .m3 model format. Here we go:
Overall structure
The .m3 format is a mixture of the World of Warcraft .m2 format and the Warcraft 3 .mdx format in that it has a parsing structure similar to the former, but uses tags, like the later. The list of tags with offsets are located at the end of the .m3 file. Specifically, the file header is
struct M3Header {
fourcc header_tag
uint32 tagindex_offset
uint32 tagindex_size
uint32 unknown1
uint32 unknown2
}
The header_tag should be ‘MD33’. The tagindex is the aforementioned list of tags, and starts at tagindex_offset and has tagindex_size elements. I suspect that unknown1 and unknown2 may point to the tag where to start the recursive parsing, which is the ‘MODL’ tag. The elements of the tagindex have the form:
struct Tag {
fourcc tag
uint32 offset
uint32 repetitions
uint32 version
}
The first two elements, tag and offset, are pretty obvious I guess. The interesting part starts at the repetitions part. Each tag describes a fixed-size structure, which may be a little different for different files, hence the version number. The number of repetitions tells us how often this
structures is repeated in the chunk. As an example, a string chunk (with a ‘CHAR’ tag) describes a string that consists of repetitions many characters of size 1 byte (unless it’s UTF8, but I’ve not encountered any special characters yet). This is great for loading, because we don’t need dynamic parsing. We can just allocate and read the structure as many times as requested, without worrying about dynamically sized content.
If dynamic sizes are required, they are placed in a seperate chunk and referenced via the tag. Again, a good example are strings, which you can find all over the file, usually directly after the chunks where they are needed.
All chunks are padded to 16 byte boundaries using 0xaa.
Parsing
Once the tag directory has been read, my parsing starts at the ‘MODL’ tag (and I suspect Blizzards parsing does, too). The ‘MODL’ chunk is essentially the root of the parsing tree for the .m3 format, just like the header of the .m2 format, but instead of referencing the other chunks by offsets, they are referenced by their index in the tagindex. The references usually also contain the number of repetitions, which make them quite easy to spot.
That’s it for today and the general parsing and file structure. Next time I’ll get into the details of detecting the vertex format, reading out the vertices and faces and finally rendering a rough version of the model.
As an outlook, here are some more pictures (Templar, Ultralisk, Hydralisk):


Blizzards art team is just incredible!

Starcraft 2

By now most of my regular readers probably know that I’m going to pause working at the university by then end of February and start working full-time for Limbic soon after. That means I’ll also start to blog more, again.
By now most of you also have heard the exciting news that the Starcraft 2 Beta launched last week. As I have done many times before, I couldn’t resist to dig a little into the game data formats and see what I could find. Of key interest to me is the .m3 model format that is being used in SC2. It’s a natural evolution of the Warcraft 3 .mdx format, and the World of Warcraft .m2 format.
Since I can think, I’ve been dissecting formats of games that I liked from a technical point of view, in order to learn how they accomplish those great games (some prime examples are Half Life, Quake 3, Black&White;, Warcraft 3, World of Warcraft). Hence, for the sake of the good old times and out of pure curiosity, I couldn’t resists opening up the SC2 files in my Hex Editor and dig a little around. To my surprise, the .m3 format is quite a bit different from .mdx and .m2. It didn’t take too long though to find myself through the file and extract the meaningful bits of data I needed to get a better understanding of the whole thing. So far I can say that I already love the SC2 engine, even more than the WC3 and WoW engines.
Here are two examples of cool models I rendered with my python tool, only vertices+normals for now (Hatchery and Hydralisk):

If someone is interested in my findings or if I find more time to make the tool more worth-while, I may upload it somewhere (github, probably) and I’m happy to share my insights with anyone interested.

Appstore Piracy

http://www.macrumors.com/2010/01/13/cost-of-app-store-piracy-pegged-at-450-million/

I have absolutely no clue where they have those numbers from, but those 75% piracy rate they claim I can not confirm at all. For TowerMadness, it has been at most 10%, even before we released Zero.
Also, if their numbers are right, 0.45 billion paid downloads would generate 1.35 billion wares downloads. Anyone knows a app piracy page that comes even remotely close to that number of downloads? Assuming an average of 5MiB per App (and that’s probably a rather low estimate), that would be 6.29 PETABYTE of traffic. That’s probably around $670k just for the traffic if hosted at Amazon S3.
And whoever started calling it piracy should be put on a boat, sent to the horn of africa and hang out with the real pirates a little.
‘Nuff said.