Multithreaded Renderer on iOS

Hey #iDevBlogADay,
You’ve probably seen this: You start a game. After the loading is complete, the game runs smooth for a brief period of time, then it suddenly starts stuttering significantly for a few seconds, culminating in a automated banner: “Welcome back to GameCenter”. If it’s an action game, you may just have lost a life to the stutter.
Last week, I tried to investigate this, and a potential cause was revealed to me: GameCenter is running in the same NSRunLoop as everything else, on the main thread. Hence, when it connects, performs the SSL authentication, encryption, decryption, it blocks the main thread. Apparently, this costs enough time to delay the rendering.
Not only was this cause revealed to me, but also a potential solution: Put the entire OpenGL rendering into a separate thread, so it’s not tied to the temper of the NSRunLoop and it’s many potential input sources. So I set out to try this.
Multithreading OpenGL Requirements
Writing a multithreaded OpenGL renderer isn’t trivial. And due to my perfectionism, I wanted to do it right. That means:
  • Clean implementation
  • Fires exactly at display refresh, using CADisplayLink
  • As simple as possible, trying to avoid any low-level multithreading if possible
  • Since a single EAGLContext can only be used on one thread at a time, ideally everything should be only on the secondary thread, and no code should run on the main thread.
These requirements led me to my first approach.
Using GCD
I love Grand Central Dispatch (GCD). It’s a great way to parallelize and defer code execution. And some tests I conducted showed that the overhead caused by blocks is tiny.
Hence, I created a new serial queue (as suggested by the Apple iOS OpenGL documentation), and essentially queued all calls to OpenGL in blocks on that queue, which runs on a different thread. One of the first problems that arose was that setting up CADisplayLink on that thread doesn’t work, because the GCD queue threads don’t have a NSRunLoop, which is what CADisplayLink uses. Hence, my display link callback wouldn’t be called at all.
However, since CADisplayLink doesn’t call any OpenGL code by itself, I moved it onto the main thread, and then dispatched the draw event onto the rendering queue from there. Now the callback got triggered at 60hz, as expected. But the rendering didn’t work. I’m pretty sure I enforced the right EAGLContext on every draw event. And after a couple of dispatch_asyncs, the [context presentRenderbuffer:] function would stall for 1s at a time. I fiddled around a lot with this, but couldn’t get it to work.
If change the rendering queue to run on the main thread (by using the main GCD queue), it magically worked well. But everything was executed on the main thread, which set me back to beginning. That’s as far as I’ve gotten with the GCD approach.
Using a separate NSThread
My second attempt then involved an NSThread. The setup was very simple: When the GLView was created, I created the new thread, and inside I started a NSRunLoop. Then, I set up the CADisplayLink to run on this run loop. And it worked. The CADisplayLink fired reliably and the scene was rendered correctly. However, there was a small issue: There seems to be no (reliable) way to terminate the run loop. Hence, once started, I couldn’t stop the rendering anymore. That’s not really what I needed.
A classical approach
This is as far as I’ve gotten. The next thing I want to try is to create an NSThread that runs a very simple loop. It sleeps until a semaphore is fired, then renders one frame, and then sleeps again. Then, on the main thread, I run the display link, and trigger the semaphore whenever the display link fires. This is very old school, but at least I can turn it off at any time, and it appears to match all the requirements I have.
Summary
What seemed like a nice little afternoon project starts to cost a lot of time now. However, considering that moving the rendering to a separate thread seems like the best solution for the GameCenter login stuttering problem, and any other stutter that is caused by high runloop latency, it seems worth the effort.
Once I’ve found a good solution, I plan to make it open-source, and also include my performance monitor thingie that I described in my last post.
To close this post, I’d like to ask everyone out there: Have you written a threaded renderer on iOS? How did you make it work? Did it work reliably?
Cheers,
Volker

Nuts performance levels

Hey iDevBlogADay,
Since my last blog post I’ve arrived in the Limbic HQ, Palo Alto, CA. We’ve also launched our latest game, Nuts!

Today I’m going to write about how we managed to make Nuts! a beautiful 3D game running at 60hz even on the 3GS. I’ll go into the details of many common optimizations and try to analyze how much they actually gained us.
Measuring Performance
The most important thing for optimizing performance is a way to measure the performance, and its change as you modify the application.
The tool of choice for us is a little plug-in module we call the performance monitor. It records wall-clock render times, game update times and idle times. Idle time is the period spent between update and the next render, usually sleeping, updating cocoa, etc. You can see a annotated example here.
It’s a very valuable tool, because of the way it plots performance of individual parts of the app over time, it helps you correlate performance events with potential causes. In our games, it’s only compiled into the developer version, and it can be activated by double-tapping the top left corner of the screen.
30hz vs 60hz?
This is a question I’m very passionate about. Recently at the Santa Cruz iOS Game Dev meeting, the great Graeme Devine mentioned this as well. It’s very important that your game runs smooth. Although most users will not admit it, slugishness, unresponsiveness, stuttering in a game are huge factors for instantly putting it away.
Keeping this in mind, when you’re at the early stage of a new project, and you have a working prototype, you need to make an important decision: Do you want to go for 30hz or 60hz?
Let’s think about this for a second. If you decide to go for 30hz, it means you can have twice as much stuff in one frame, compared to a 60hz game.
Many people will also argue that noone will notice the difference between 30hz and 60hz. They could not be more wrong. It really depends on the game you’re making. For Nuts!, we experiemented with both 30hz and 60hz, and 30hz, although it had a very smooth and stable framerate, just didn’t feel right. It was less responsive, and it didn’t play well. Plus people were more likely to get motion sickness from it, which is a big factor as the game involves a camera that is constantly rotating around a tree. Hence, we knew the game has to be 60hz, and we took this into consideration for all the art and further engineering.
For another game that we’re currently working on, it’s a completely different story. It is a very different kind of game and 30hz is completely fine. And because we’re 30hz, it means we can show more stuff, and at higher quality.
General Engine Design
To start out, I’d like to give you a small overview of how our game and engine is structure. Our OpenGL ES 2.0 engine is really simple and “dumb”, it doesn’t do any kind of automatic batch sorting. All we do is load models, which are objs plus a set of OpenGL states. There is only one shader that is quite simple and highly optimized to do everything we need.
The problems
In the week before launch, the game actually ran pretty well, mostly exceeding 60hz on both the iPhone 4 and the 3GS. However, we had random stutters here and there, that were really distracting and even could cause you to crash into a branch and lose.
Optimization 1: Vertex Array Objects
At WWDC last year, the Apple engineers recommended having a look at VAOs, as they can lead to a significantly reduced overhead when drawing a lot of batches. Hence, I went ahead and updated our engine. In principle, this is very easy, but there are some pitfalls and the implementation is very unforgiving. If you make a mistake, the code is very likely to crash, often by some form of memory corruption, deep inside the OpenGL code. After it all worked, we even saw a moderate performance gain, but it wasn’t anything significant.
However, considering how simple this extension is, and how easy it can be built into an engine, I strongly recommend everyone to use it. There is nothing to lose here. UPDATE: Actually, there is something to lose. Every single VAO takes up 28 KiB of memory. For Nuts!, That’s 2.5 MiB just for the VAOs. It heavily penalizes VBO animation. It seems to be a good combination with skeletal animation, though.
Optimization 2: State Caching
Before my final optimization pass, we were already caching many states, so I can’t really give any feedback on that. But we basically didn’t cache any of the OpenGL ES 2.0 states: Shaders, uniform bindings, uniforms, etc. In every drawcall, we were re-enabling the same shader, loading all uniform locations for that shader, and setting it to the right values. That sounded very much like an opportunity to optimize.
However, after I implemented this, I did not notice any improvement in performance. I don’t know if the driver is now “smart enough” to do the state caching itself, but it seems to not have much effect on the overall performance. As such, I would still recommend caching for any of the easy stuff (glEnable states for example), but caching each individual uniform value seems to be overkill.
Optimization 3: Instruments
Instruments is a double edged sword. On the one hand, I love the leak checking and the new driver analysis. On the other hand, I think the CPU, GPU performance monitors, and the driver analysis are mostly useless. You may have noticed that I mentioned the driver analysis in both, that’s because while it gives you a lot of cool insights, and it may catch a few bugs, it didn’t have a lot of valuable insights into making the rendering faster. For the most part, the things it was very obsessed about didn’t have any effect at all. But that may also be because I’ve been doing this for too long.
Optimization 4: Alpha Sorting
Initially, we rendered the scene kind-of arbitrarily. We would render the tree, the squirrel, then render some transparent effect, then the branches. We were more concerned about depth-correct rendering, than about performance at that point. However, the way the iPhone GPU works, it’s actually more beneficial to completely separate the solid from the transparent rendering.
To help implement this, I added a two-pass mode to the engine. The first pass would only allow solid objects to be rendered, and it would complain if any rendering call tries to enable alpha blending. For the second pass, it was the other way around.
This actually helped the performance, especially in the peaks, which were sometimes caused by displaying a lot of transparent effects that would be alternated with solid render calls, like the fireball nuts and their particle effects.
I strongly recommend designing the whole renderer in this way. First, render all solid objects, then come back to render all non-solid objects. And enforce it, in case the artists try to be smart and fancy about something.
Optimization 5: High-level optimizations
By far the most significant improvement was the higher level optimizations. Usually, the performance issues came down to rendering too many things of one kind, or a model that was weirdly engineered to trash the texture cache with every single one of it’s hundreds of triangles.
The performance monitor and A/B testing really helped a lot in pinpointing down the causes and fixing them.
Also, often when you’re getting stuttering, the performance monitor will tell you that it’s because the frame time is just a little bit too long every other frame, so the system keeps missing one out of every few draw events.
One other important thing to note is that once you know what performance target and visual quality level you’re aiming for, you should figure out the limits of what you can display, and enforce them. If you don’t, players will most definitely take your game, clump up all enemies in one spot, and blow it up in some crazy, unanticipated way that will completely destroy the performance. And it will become a norm. We learned that the hard way in TowerMadness.
Hence, if you implement an effect system that keeps track of and animates effects, also make sure that it has a cap on how many effects it will show, and that it gracefully handles a situation where too many effects are present.
Also, if you were to make a Zombie game, don’t just allow unlimited Zombies to spawn. Make sure the numbers are limited, and design the game to work with that number. If the game is only fun through excess that can’t be sustained, you should go back to the drawing board. That’s also a good lifestyle advise, now that I think about it.
Summary
As you may have noticed, none of the optimizations by itself really did the job alone. It was the mix that made our game run at 60hz no matter what the player does, even on the 3GS.
There are also many things left to optimize. Like the math library is completely not optimized. But there is no need for that, as it’s not the bottleneck of the game. Optimizing it would probably take a long time, and only reduce the total CPU usage by 2-5% that’s what we estimated for Nuts! Having a good profiler helps a lot.
I hope summarizing up my notes on the Nuts! performance tuning process gave you some ideas about what to optimize, and what is probably not worth it, and I hope it makes your life easier in the future. And hopefully mine too, since I thought about this a lot while writing the article.
In case you’re there, see you at WWDC! We’ll be wearing Limbic shirts most of the time and my MacBook Air has a Yoshi on it, so we’re easy to see. Don’t hesitate to come over and say hi!

Guest Post: Virtual Game Development

Hey iDevBlogADay,

since I have got very little time, due to the fact that I’m leaving for Palo Alto in a few hours, the start of this years WWDC trip, I have asked my fellow Limbic co-founders Iman and Arash to write a little guest post. They’re writing about the problems we face as a company working in two timezones with 9 hours in between, and being almost “purely virtual”. Here it goes:

Unlike many startups, Limbic operates as a virtual company. In our case, we collaborate with team members in seven locations across the globe (Palo Alto, Davis, San Diego, Burbank, Germany, the Netherlands, and New Zealand). As one can imagine, operating in this fashion brings many challenges, but in our experience it comes with substantial benefits as well.

In order to support this arrangement, some degree of planning is essential, as meetings across multiple time zones must be coordinated. For our projects, we use a slew of tools for communication and task management:

* Skype, IRC, and iChat for voice and video conferencing

* Skitch and Dropbox for sharing images and videos

* BananaScrum and Lighthouse for project planning and task management

* GitHub for source code hosting and collaborative development

* Doodle.com for scheduling

The most common problem with working across multiple time zones is finding overlaps in the availability of US and European team members to meet. This leads to inevitable late night or extremely early morning meetings. When working on dependent project tasks, we have found it is important to sync up daily and hand off to other team members to ensure smooth and continuous development. If a voice or video meeting cannot be attended for a particular day, individual members communicate their progress via email to the team. Also, because team members aren’t able to casually communicate throughout the day and all discussion happens during meetings, they tend to run quite long in order to cover all issues.

One of the difficulties with virtual collaboration is that it can be slower than face to face communication for rapid iteration. We minimize this by using screen sharing, chat, and video conferencing whenever necessary. A tremendous advantage to working virtually is that it allows everyone to work from their own favorite environment (coffee shop, home, etc.). In addition to the environmental benefits, commute time is reduced or eliminated in many cases, allowing more productive time for work. FInally, with no office expenses to pay, the operational overhead of the company can be reduced.

Recently at Limbic, we have moved towards capturing the benefits of a shared workspace by establishing a small studio in Palo Alto as a hub for physical collaboration, while maintaining the flexibility provided by continuing to operate primarily virtually. We also like to bridge the gap between all team members by periodically planning retreats where we all meet up face to face to have fun, brainstorm, and help kick-off new projects.

That’s it! I’m rather exited for the next post, as it’s going to be one week after we launch our new game, Nuts!

German iOS Developers

Hey #iDevBlogADay,

todays post I’d like to write on a more personal and local note. However, if you’re interested in technical stuff, check out my post from two days ago, courtesy of Keith of Imangi.

As most of you guys know, there are a lot of iOS developers and get-togethers/meetings in the US. I never had the honor of attending one, but my Limbic co-founders Iman and Arash spoke highly of them.

I’m German, living in Cologne, and so far I’ve barely met any german iOS developers, let alone iOS game developers. Even at GDC Europe, it seemed like iOS wasn’t that big of a deal yet. However, there are a few big shops, like EA, even in Cologne, and it really riddles me if there are no german (indie) iOS developers, or if it’s just hard to find them because there are no proper communication channels.

So, my hopes are that I can find some German (or even European) iOS developers through this channel. Maybe we can even have a few get-togethers somewhere in the region to talk about tech, share knowledge, contacts, etc.

Please let me know (in the comments/by email to volker@limbic.com) if you feel addressed and you’re interested!
Cheers,
Volker

Web APIs & plists

Keith of Imangi enlightened me. Instead of only serving JSON data in our web services, for iOS plists are even easier to use (they don’t require an additional library).

And it’s super simple to support both in a Python powered server, especially if the code is already dictionary based:
if format == “plist”:
from libs import plistlib
out.write(plistlib.writePlistToString(result))
else:
from django.utils import simplejson
out.write(simplejson.dumps(result))

If you find this helpful, make sure to check out Imangis games, especially Max Adventure <3

App Size

Hey iDevBlogADay,
I’ll only write a short post, as we’re in the final crunch for our next game. I’m super excited about it, and I’ll share more in the following weeks.
This week I’m just going to cover one aspect that is of particular importance to me and Limbic, and I think it’s one of the thing that many devs ignore: Reasons for keeping your app size small, and how to achieve that. Obviously, this doesn’t apply to every app (I couldn’t imagine Rage being a 20 MiB app), but especially for us Indies, it has some important arguments we should consider.
In our upcoming game, our artists did an incredible job at optimizing the assets. We’re right now at around an incredible 8 MiB, and it’s visually stunning (you’ll see :-).
Why keep the App Size small?
I’ll just do a classical bullet list collection of arguments:
  • First of all, there is the over-the-air download limit of 20 MiB. That means any app larger than 20 MiB can not be downloaded via 3G, and must be downloaded on WiFi only. Obviously, this does affect sales to a degree. I’ve heard some numbers floating around of about 15% sales drop if you go over the 20 MiB limit, so that’s something to very carefully consider.
  • There is a (somewhat complex) relationship between app size, loading time, and memory footprint of the app. And I have a fetish for fast loading times. Having a small memory footprint makes a lot of things easier for development, too (as you can basically ignore the memory warnings, that’s what we’re doing in TowerMadness and it works great).
  • The larger the app, the more likely I’m to delete it when I’m hitting the limit again (I only have a 16 GiB device, like most). As such, keeping the app size small means that people are more likely to keep it.
  • With a small app size you can ensure you’re not going over the 20MiB limit even after many updates. TM went from having 4 maps to more than 60 maps, and it’s still below the 20 MiB.
Obviously, there are also many drawbacks, such as potentially prolonged development time, limiting quality, preventing you to use certain features ,such as video. I remember one app that was very simple, and could probably fit in about 15 MiB. But they shipped with almost 200 MiB of video, which was shown once.
What can be done to optimize the app size?
For us, the classical items that quickly add a lot of size to an application bundle are:
  1. Videos
  2. Audio
  3. Code Bloat
  4. Textures/Images and other GPU data
I can’t really say that much about 1), as we haven’t included any videos in our apps yet. I really don’t know a lot about 2), so I don’t know how much you can gain by choosing different codecs, etc. We’re mostly using .caf files, but that’s probably not very optimal and all I know.
I can say a lot more about 3) and 4) though. Although it may sound a little strange, code size can very quickly blow up. I’ve done a little survey, and I’ve found binary sizes ranging between 1MiB and 10Mib, for apps of approximately equal complexity. This can mean that you can double the amount or quality of the assets, and still be in the 20MiB limit. One thing that can cause code bloat are third party libraries. I’ve seen anything from +10KiB executable size for a library, up to almost a megabyte, where you wouldn’t expect it. And often, it doesn’t even correlate to the complexity of the libraries functionality. So, an important hint here is to watch out for the executable size when you add libraries.
Another important factor is language-related code bloat. Especially if you use C++ with a lot of templates, this can happen pretty quickly. I haven’t tried this on iOS, but I regularly had issues with Boost blowing up the binary size significantly. The problem with templates in C++ is that, if not written carefully, they can produce a lot of duplicate code, that essentially does the same thing, but for a different type. So, a second hint here would be to watch out these effects, and factor in exectuable size into consideration.
Multi-objective optimization
The fourth item on the list is a rather complex topic. On the one hand, you want your assets to be as high quality as possible. On the other hand, you want a low loading time, and small app size.
So you have three different objectives, that affect each other. For example, a common way to reduce file size is to use PNG or JPEG for the compression of texture data. While this has a huge effect on file size, it also negatively affects loading time. The PNG will be loaded to memory, then cpu-intensively decompressed into a larger buffer, and then uploaded to the GPU. It can also have grave implications for your memory footprint. While your PNGs are small, once they’re decompressed, they’re usually in memory as RGB or RGBA, using up to 32 bits per pixel. Just a single 512×512 texture already takes up 1MiB in memory (and on the GPU) then. To counteract that, you can do some real-time encoding to a different format (such as RGB565), but this takes up even more CPU time. So, here you have a difficult balance between file size, cpu time and memory footprint, which can quickly tip over to one side or another. Especially if you’re not paying attention, and you’re just adding the PNGs the artists hands you.
Our solution
So, here is what we do to tackle this problem at Limbic:
First of all, to minimze loading time, we’re preprocessing everything to a GPU format. That means our meshes are in the exact format that the VBO data we upload to the GPU. For textures, we use the .pvr format. That means our textures themselves are not compressed (unless we use PVRTC), but they are only encoded with the GPU ready pixel format. This gives the artists a direct feedback on the final size of the texture and the strain of the GPU. We then put all the files into a custom package format, that’s similar to Zip with a few performance enhancements. This approach gives us a really small load time, and the memory footprint is about equal to the uncompressed size on disk.
Another important element of the size optimization is a demanding producer, that constantly and from day one keeps a watch on the file sizes and the game performance, and lets the responsible people know if something is getting out of hand.
Conclusion
All the hassle finally pays off for us. For our latest game, that means gorgeous visuals, 8 MiB file size, 4s unoptimized initial loading time and no further delays, for a full 3d game with a completely 3d menu. It’s a constant struggle to get there, but I’m looking forward to it for the next project already.
Back to crunch mode. 🙂
Update:
Bob Koon sent me a link to his page: http://www.appsizematters.com/ Lots of great information there!

TowerMadness Retrospective Part 3

Hey everyone,
as promised last time, I’ll write a little bit about the server side of TowerMadness, and the design decisions we faced. This will be very non-technical.
Preface
I should probably build a little bit of credibility before I start to write about all this server stuff. I’ve been working with web technology forever. I love the technical side of it. My main achievement in this area prior to the TowerMadness server infrastructure is a little project called Waaagh!TV, which I’ve created in 2004, and it has since streamed tens of thousands of Warcraft 3 games in real-time to around a million of viewer in total.
That said, TowerMadness was a special challenge. At the time, there weren’t any proven frameworks, like OpenFeint or Plus+ yet, so we had no other choice than to do it ourselves (although we probably would’ve done it ourselves anyways). We had a lot of special requirements, too. Competitiveness was very important to us, and it’s one of the main selling factors of TowerMadness. Replays for every game are analyzed and stored online for other players to watch. We also had DLC maps long before IAPs, although we’re now back to just updating the game, which is also an effective marketing tool. We had real-world price competitions, but after they generated little publicity we stopped doing them.
The other factor was that we had no idea how well TowerMadness would sell. At that time, there were games that sold 250k+ copies, which seemed like an unreachable number to us. Because of our focus on competitiveness, we asked ourselves an important question: What if it is successful and it gets high up in the ranks? It would be terrible if at that short period of time where we have this huge amount of exposure, the server system can’t handle the load and hence severely taints the user experience.
As you can see, we had a lot of things planned, and fooled around a lot. As such, we wanted a flexible and scalable online system.
TowerMadness Server System
We initially started out with a Dreamhost hosted php+mysql server in early beta. It was very cheap, but that’s it. It had many downtimes, high latency, and it seemed impossible to scale, so we gave it a no go.
We then investigated many of the upcoming very popular cloud server systems, EC2, AppEngine, Rackspace. We picked AppEngine mostly because of the ease of use, the very lovely free tier, and the fine-grained scaling (money and requests wise), all without manual intervention as would be required with EC2 and Rackspace at the time. Seemed like a perfect solution for a startup that doesn’t know yet what it’ll score.
Long story short, AppEngine has proven to be a great decision. We’ve scaled up from only a few requests a minute in beta to 30 requests per second, which is millions of requests a day. Once in a while I needed to improve the scalability of the code, which was mostly due to not knowing how the next order of magnitude of requests will pressure the datastore and CPU, and hence making uneducated mistakes. At no time, however, was our service seriously affected by any prolonged service interruptions (apart from occasional and very rare AppEngine maintenances).
Word of caution
AppEngine is not a classical LAMP system, with long request times and a full-featured MySQL database. The datastore is very scalability oriented, and it is really not much more than a hash-map that maps keys to automatically serialized objects. The query functionality is very limited.
In my eyes, it simply tries to prevent you from doing any unscalable things (although you can still do those :-). This is the reason why the backing database is very limited. Personally I learned a lot about scaling into the million requests a day realm by understanding how AppEngine works. There is a lot of interesting things I learned that I should probably share on this blog.
Nevertheless, as a summary I should say that I love AppEngine for what it is, a cheap, efficient and very scalable solution for many server problems. It was the perfect choice for TowerMadness. But it’s not for everyone, and not for every problem. I think the goal to a successful Web infrastructure is to pick whatever gets the job done best, and it takes a lot of time and experience to find a good solution. We’re by now running several EC2 nodes, many AppEngine apps, and a few dedicated servers, and we’re still learning our ways through this.
Game Accounts
Another very interesting problem that is related to the server question is that of user authentication. As a game developer, you want an easy to use and reliable way of uniquely identifying your users. This enables a lot of interesting features, like online high scores, cloud storage of user progress and settings, sharing friend scores in social networks. However, this can turn out to be pretty tricky.
In TowerMadness, we used a user-entered username plus the device UDID to uniquely identify users. We also wanted to support multiple accounts per device, a decision that made our life harder, while it added little to the game, as only a very very small fraction of the users use this feature. We then later on added GameCenter. This caused a lot of trouble and still creates issues now and then.
All in all, I’m not so happy with how the whole situation turned out. The solution we have in TowerMadness is far from ideal, and I would definitely change it if I would do it again.
However, the whole user authentication is still an unresolved problem, so here is a quick overview of the most popular techniques that I’m aware of:
Game Center:
Apple’s built in iOS social network for games.
+ uniquely identifies a user
+ built-in authentication
+ provides a social network of friends
– GameCenter adoption rate not great (probably because the signup and authentication is not great yet)
– GameCenter API has issues
– GameCenter does not provide secure authentication for a third-party server (such as an OAuth token, so storing private information is a bad idea. Your user id is public, so someone can easily mess with your data.
– locked into the iOS platform (hopefully MacOS soon)
UDID:
Using the Device’s unique ID to identify a user.
+ no signup, no question to the user, very lightweight
– no secure authentication
– uniquely identifies device, not user (can’t easily transfer to new device)
Email+password system:
Asking for the user email and securing it with a password
+ uniquely identifies users across devices
+ can be combined with a newsletter signup
+ people don’t forget their email as easily as a random user name
– people need to sign up with their emails, which can be akward
– you need to manage password recovery, password changing, etc
Facebook:
Using the facebook ID of the user and valid authentication tokens to uniquely identify the user and integrate into the facebook platform.
+ tight viral marketing integration
+ secure authentication via OAuth
+ Everyone has an account
+ people don’t forget the login, you don’t need to take care of account management
– some people are allergic to facebook logins/signups
Twitter:
Same as facebook, except:
+ in my experience fewer allergies to typing in twitter credentials (probably because less private information on twitter)
– fewer users, and hence less viral potential
Third Party:
Using a third party service like OpenFeint, AGON, Plus+, Scoreloop, Cloudcell.
+ usually easy to integrate and provides a lot of convenience (cloud storage, etc)
+ no worries about server side
+ usually secure authentication
– give control out of your hands. the more important the online business is for your app, the less attractive this is
– can go out of business (see AGON shutting down http://developer.agon-online.com/blog/)
– may not provide a social network/friend graph
– high fragmentation between OpenFeint, GameCenter, Plus+, Agon, etc.
– usually add a large footprint to your binary
As you can see from my unscientific and biased presentation, there is still no perfect solution for the online account system. However, my hopes are still for a better GameCenter API with secure server authentication, better user experience and MacOS support.
I really hope that one of the frameworks will eventually become ubiquitous, so that integrating social features (such as friend scores) is more effective, as it’s something really fun. I love comparing my scores to my friends, it’s so much more personal than fighting for rank #342523 against “MikeDevil147”.
But for now, it seems that either going with a third party framework, or rolling your own thing with multiple different authentications (Facebook, GameCenter, maybe Email+pass) seem to be the best solutions for a game like TowerMadness.
Conclusion
A scalable server side can be a very difficult problem and should never be underestimated. User authentication is also something that still needs to be solved in the long run.
As usual, let me know in the comments if you have any specific questions, I’d love to answer them.

TowerMadness Retrospective Part 2

Hey everyone,
welcome back! This week, I’m going to write about our decision to go to three dimensions, and how we did that technically.
Why 3D?
As you may remember from my last post, after a few weeks we actually had a fully working 2D game, that was ready to ship, modulus artwork. We then decided to pimp the look of the game, by bringing Arash on the team. At this point, we made the decision to 3D, but it involved a lot of thinking. Some of the core arguments:
  • We were 3 people on the team now, all of them having a degree in 3D Computer Graphics
  • The artist was significantly more experienced in creating 3D artwork than creating 2D artwork
  • At that time, there weren’t any 3D RTS games on the AppStore
  • Last, but not least, we love 3D, that’s why we studied it. Since TowerMadness was a fun project, we figured we wanna pick what is most fun to us.
There is one key problem with the third argument, being the only 3D RTS game on the AppStore. That is, we got beaten to the market by Star Defense. Another Tower Defense game. In 3D. Featured at a Steve Jobs keynote. And they were using spherical maps. You can imagine we were pretty shattered for a short bit. But we didn’t give up, we took TowerMadness and made it more beautiful, faster, and more fun. And now, almost two years later, we’re still here and kickin’. Seems like our passion helped us over that hurdle. Our lesson: don’t give up, even if (for the moment) it looks like you just got beaten in every possible regard.
Anyways, back to 3D engines.
3D Engine Overview
I’m reluctant to even call what we have in TowerMadness an “engine. What it really is: a very pragmatic collection of utilities for and a wrapper around OpenGL ES1. The features:
  • It caches the OpenGL states, and allows data-driven modification of these through custom material files, like (yes, we love JSON 🙂
“lightning”: {
“tex”: “lightning”,
“depthmask”: 0,
“color”: [1.0, 1.0, 1.0, 1.0],
“cull”: “none”,
“blend”: “alpha”
}
  • It can load .pvr textures (by the way, check out my little PVR Quicklook plugin: https://github.com/Volcore/quickpvr)
  • It loads a .texture file, which has additional information about textures, such as filter modes, like
{
“type”: “pvr”,
“file”: “landmineA”,
“mag”: “linear”,
“min”: “linear_mipmap_linear”
}
  • It loads .vbo files, which are nothing more than externally pre-compiled vertex and index buffer objects that get loaded straight onto the GPU. We wrote a little C++ tool that loads the .obj and outputs the .vbo file
  • It loads .model files, which combine all of the above, linking a material with a vbo file, like:
“landmineA”: {
“vbo”: “landmineA”,
“material”: “landmineA”
}
  • It supports a point sprite cache, where you can queue up point sprites with certain materials, locations, sizes, and they get all rendered in one batch per material later
  • It doesn’t free anything. Since the RAM and GPU footprint of TowerMadness is really small, we get away with that
  • Everything is handle based. This turned out to be great. If a model couldn’t be loaded, or was forgotten to be loaded, it didn’t crash the game. Rather, it always showed a little colored cube. In one instance, we forgot to add a .model and .vbo file to xcode, and the game shipped without the flamethrower level 3 model. Instead, it would show the “flaming cube of death” (see below), as it’s fans called it. Imagine it had crashed the game instead.

And that’s it, our “Engine”. In a nutshell, to the game code it’s not more than PGL_loadModel and PGL_drawModel, with a lot of OpenGL transforming and matrix pushing. But that’s all it needs. There is no in-engine culling (although we experimented with that, more on this later), no post processing, no lighting, no multi-texturing, no particle systems.

And that’s a little bit odd. When thinking about 3D engines, it always seems this magical thing that has so many fascinating features. But it often clutters the most important feature, which is to place a certain model at a certain position. And that’s what you need most of the time, for a regular game. As such, it should be optimized to make it very easy to do that.
Pipeline
To make it easy to place objects into the game, having a good art pipeline is extremely important. This includes every step the artist needs to do in order to put his new model into the game. This is so important, because the longer the pipeline, the longer iteration times.
As such, ideally, you’d have something where a plugin in the DCC tool exports the object right into the game. But since we’re indies, and we normally don’t have time or money to develop these tools ourselves, we need to improvise. And since we’re agile, we start out somewhere, and then iterate with artist feedback until it’s good enough.
Initially in our engine, the artists had to export their models as obj, then convert it to .vbo, create a .model, convert the texture to .pvr, create a .texture, create a .material file, then add everything to xcode and add the PGL_loadModel/PGL_drawModel commands. That’s a lot of steps, especially considering that most of them will be identical. That’s what we realized (way too late in the development, though), and optimized this a little. The .model, .texture and .material files looked the same for most of the objects: solid, trilinearly textured models, and the filename of the texture and vbo are the same, mod the file extension. Hence, we made those files optional, and saved the artist a lot of work, while also avoiding a lot of very small files.
Performance
I’m a performance fetishist. At least to some degree. I can’t stand if a game doesn’t run at 30/60 hz (depending on the genre). I will push the team for days to find the cause of stuttering, slow downs, etc.
But I’m also refraining from low level optimization as much as possible.
I believe it’s much more important to properly bound and control the amount of stuff rendered, than to get the last 5% out of the rendering code, to be able to render 5% more stuff on screen, especially if it comes at the cost of ease to use and maintainability.
This is what went terribly wrong in TowerMadness. Not only were the new levels constantly pushing the size, number of spawn points (and hence number of aliens), build-able spaces and doodads on the screen, and not only was the art constantly getting a few more triangles here, a few more triangles there. We also defied all reason, and added an endless mode to the game. Before the endless mode, every level was limited in time and complexity. Because we knew how much money you were getting in total, we knew how many towers you would be able to build at most. And we knew how many aliens were there at most, because even if you slowed all of them, there was a finite and usually small number. But we (and the fans) wanted more. More towers, more aliens, more everything. So we added that endless mode, and suddenly there was no upper bound anymore. The players could send wave over wave, slow them down, populate large maps with many many towers, until everything was maxed with (rendering and cost) expensive nuke towers. As such, we’re regularly getting complaints about the game becoming unplayably slow after some absurdly high wave that the game was never designed for, in endless mode. To date I’m not sure if adding the endless mode was a good idea.
In general though, for TowerMadness the main performance problem in rendering is that we have a lot of stuff on the screen. There are many doodads (trees, barn, sheep, etc), aliens and towers. Just to give you some estimated numbers, I think there are about 40-50 tree groves, 1-50 towers, 0-400 aliens and 10-20 doodads on the screen. And the opportunity for batching is very limited. However, I think the most important reason for why the game runs as fast as it does is that we’re doing a lot of “smart” batching. Eg. when rendering the towers, we first render all the tower bases, then render all the towers of one type, towers of the other type, etc. That way we minimize the state changes, but without actually performing any form of sorting on the engine side. It’s all pre-sorted.
We actually spent quite a while on optimizing. One other approach was to add frustum culling to every PGL_drawModel call. While it worked and culled a lot of stuff, the problem was that the generic culling was about as expensive as many of the rendering calls, which gave us no speedup at all. In worst case (fully zoomed out), the game was running at about 50% of the speed of without culling.
Then we did a “smarter” culling approach, where we just cull certain things, like towers, which have several draw calls. This actually gave a nice speedup.
The main problem, though, were the trees. Here we tried several things:
  1. Just render each tree grove separately
  2. Same as above, but with frustum culling
  3. Put all groves into one Vertex Array (VA) and render that (preprocessed, avoids state changes)
  4. Put all groves into one VBO and render that (preprocessed, avoids state changes)
  5. Put all groves into one VBO with dynamic culling (on the fly)
I sadly don’t have the numbers anymore, and they’d probably be outdated by now, because it was optimized on my old iTouch2G, with an MBX. There is one curious result though. On the MBX, using a VA was as fast as using a VBO, even if uploaded as a preprocessing step and then not changed in any frame. Apparently, the VBOs were _retransmitted every frame_ on the old MBX hardware! This is (luckily) no longer true for the SGX.
Eventually, we ended up using the second technique, rendering every grove separately and using a simple form of frustum culling.
3D Engine: The Bad

We have a very curious thing that comes up every once in a while. Some players, and reviewers, complain about how we’re not really using the 3D engine to do crazy things. Apparently, having a 3D engine means to them that it has to be used. However, I beg to differ. Eventually, I think it didn’t do us much harm.
The other problems with 3D is that it’s most likely going to be slower and more work than making a 2D game, so I guess you need to know what you’re doing and you should like to suffer when you do this.
3D Engine: The Good

One thing I haven’t mentioned yet is that since we made the game in 3D, it scales up very well to different display sizes. Going to the iPad and later Retina display, we had very little work to do. As far as I remember, it was just UI stuff that needed to be scaled up.
Other than that, we just love 3D, so that’s why we did it.
Would I do it again
Yes. I think it turned out really well, and matched perfectly to the skills of the team. And remember that this was designed for the iPhone hardware of the first generation. With the A5 out now, this is going to be a wild place. We’ve done some very cool stuff in our upcoming titles, but I can’t talk about that. Not today 🙂
Next time I’ll be writing about my little AppEngine-bag-of-tricks and how I got the TowerMadness server to scale to millions of requests per day.

TowerMadness Retrospective Part 1

[Note: Blogger owned me a little bit when I tried to put the formated post in here, so the version that went up on iDevBlogADay had some issues. They’ve since been edited]
I’m excited to finally be able to contribute to this collective of great iPhone indie wisdom! Over the next weeks, I’ll talk about some technical decisions and features of our first game, TowerMadness, and how they turned out in the end. While this post is meant to give others with similar design decisions more input, it also helps me to realize what was and is going on.
But first…
…let me give you an idea of who I am, and what qualifies my to write on the #iDevBlogADay list. I’m Volker, co-founder, Shogun of Technology, Emperor of Web Development and King of Game Design at Limbic Software. In short, I’m the one to blame if any of our apps doesn’t perform well, either on the performance, stability or gameplay side. I met the other two co-founders while doing my master’s at UCSD (for which, after 2 years, I finally submitted a paper, yay!), where the company was born in early 2009. After a very short PhD period in late 2009 in Aachen, I’m also proud to be able to call myself a college dropout. It was at the time when TowerMadness was starting to get really successful, and I had the chance to become a full-time indie gamedev. Now, in 2011, we’ve been to #1 on the App Store, earned many awards, and we’ve got more than 5 million downloads. What a crazy journey.
TowerMadness
Our first game, TowerMadness, was initially really just a little fun project, thought up by Iman and me on surfboards on the La Jolla beach. Because of time constraints (60h/week M.S. thesis), we were mostly working on it on the weekends, after surfing. Inspired by countless Flash and Warcraft 3 Tower Defenses we had played, and a passionate loved for them, after only a few weeks, we actually had an almost complete game.

The iTD Prototype


We lovingly called it iTD, and it was a lot of fun. But the problem was that neither Iman nor me were artists at all (I gloriously painted all the tower icons up there in Gimp, the sheep too). Combine that with the fact that Fieldrunners just came out and was taking the Appstore by storm. And although we were up to par in terms of gameplay, our graphics was atrocious. So we got Arash on board, who is not only a great programmer with strong 3D background, but also a great 3D artist, and founded the company and started from scratch.

TowerMadness as you can find it on the AppStore


This was also the point in time were a lot of important design decisions were made.
  • The original iTD prototype was written in almost 100% Objective C. We tried our best at proper OOP, every little dude had it’s own class. And it was slow. Because we didn’t know the platform very well and wanted to play it safe we decided to go with a minimalistic approach and use C99 to develop TowerMadness. It turned out to work really well, as the performance is great and we never had any issues with crash bugs, even in early alpha stage. Of course, it may have turned out the same with any other programming language, but the end result is proof that C99 worked out just well for us.
  • Back in iTD you were shooting the sheep. However after an intervention by my girlfriend we decided that we don’t want to hurt animals, and started shooting at aliens.
  • Because Arash, Iman and I are all 3D Computer Graphics Masters or PhDs candidates, we decided that making the game 3D would be worth a shot, especially since there hadn’t been any 3D Tower Defense games on the AppStore. I’ll talk about the 3D stuff in more detail in the coming weeks.
  • We decided to launch with only a few maps. The first alpha had 1 map, 3 towers, beta up to 3 maps, 9 towers, the release then had 4 maps and 9 towers. The game now, after almost two years of updating, has something around 60 maps and 12 towers.
  • We started out with a Dreamhost’d php script for the server side, but before launch we switched to AppEngine — a great move, considering we’re getting millions of requests a day now.
In retrospective, we used a very pragmatic and iterative development strategy, which probably was the reason it actually ever finished. At the time we were all still doing something else fulltime.
In the rest of this post, I’ll talk about a special little feature that so far I believe no other RTS game in the AppStore has, the replays.
Replays
Replays are a very interesting aspect of an RTS game. You couldn’t imagine Starcraft 2 or Warcraft 3 without the ability to watch replays of yourself or your favourite players.
The idea is that every game a player plays is recorded, and can be played back. You can then take that replay, and share it on the internet, analyze it with an automated tool, use it for debugging, and to run cheat-proof competitions.
Sadly this topic can get very boring and specific very quickly, so I’m trying to focus on the important design relevant parts and the pros and cons. If you like to know more about this topic, let me know in the comments.
How does it work?
How this works depends on your game. If you have a game with a few entities (eg. classical shooter like Quake), you can get away with just storing the positions of the players, items, bullets in snapshots, and then storing delta updates and important events in between. For an RTS game, like TowerMadness or even Starcraft, this doesn’t work because you have so many units and buildings. It would be an immense amount of data.
Instead, in an RTS game, you only record the game commands. For TowerMadness, that sounds pretty simple, since there are only 4 such commands: Build Tower, Upgrade Tower, Sell Tower, Start Game/Send Next Wave. When playing back a game, the game simulates each and every step that the original game did, and applies the commands exactly at the same times. This process is where it gets really tricky, as every computation needs to be exactly as it was when the original game was played. This synchronicity is one of the key challenges in implementing replays in an RTS game.
Keeping it Synchronous
On the one hand, if you follow a few simple rules, such as “the gamestate may never depend on external factors besides the input”, it seems pretty easy to keep a game in sync. On the other hand, the problem is that every tiny bug has the potential to break the replay feature.
When the synchronicity breaks, the result can be compared to a butterfly effect. What happens is that the game slowly starts to become “weird”. As an example, imagine your computation for which enemy to shoot at an external input (such as rand()). And when you play back a replay, instead of aiming at the little alien as in the original simulation, which gives money very quickly, the replayed game aims at a large alien. That way, the player doesn’t get the money in time to execute the next recorded build command, so the replayed game denies executing that command. And without that second tower, you’re in trouble, and the game will inevitably take a dramatically different and mostly fatal outcome.
To detect such issues instantly, every few frames, and every time we execute a command, we compute a checksum of the game state. This checksum is then verified when the replay is played back. The checksum doesn’t tell you what goes wrong, but it tells you _that_ something went wrong. And this is usually a great indicator for a bug somewhere in the code.
One method I use frequently to keep the replay feature sane is to run the replay at the same time you’re running the real game during development, in lockstep, and triggering a breakpoint if the game goes out of sync.
What can you do with it
There is the obvious feature of giving players the opportunity to watch their own, their friends’ and other peoples games to learn about other strategies. Some usually counter this by saying that the replay feature makes the game boring, because you know the “solution” to a level. While this is true to some degree, I don’t see why players shouldn’t be able to see the right solution if they want to. And the replay feature stirs the competition. We’ve had many people on our leader boards that push the strategies forward, beating each other by just a few points a time.
Another interesting feature is that you can analyze the uploaded replays on the server side, to learn about popular towers, easy and hard maps, and compute statistics. This is what we did for the first year-or-so. But after a while it became apparent that the sheer amount of replays submitted was too much even for my quad-core i7 running 8 instances of the stats bot at the same time. So now we actually compute the stats on the client side and submit them with the replay. We can then randomly and selectively verify if someone is cheating, especially if they have high scores on the leaderboards.
As we’re all gamers, and love competition (I’m totally into Starcraft 2 as you can see from my account) another interesting feature is running competitions. We did so, by releasing new maps for which the public replay option was disabled, and everyone could submit their replay for a week. After the week, whoever got the highest legit score won. With replays you can actually verify that every game submitted was legit. Whatever “cheating” you may do, it has to result in a game that is played back correctly.
One of TowerMadness’ key features is the competitive play. Because the leaderboards are the heart of the competitiveness, we regularly clean them from any cheating attempts using a little verification bot.
Replay Cons
Here is a summary of what is not so great about planning and implementing a replay feature:
  • Players can get easily confused if the replays are out of sync. In general, I would say that no replay feature is better than a broken replay feature.
  • It can be really difficult to get this feature right, especially regarding the non-reproducibility of IEEE754 floats, especially with math libraries and trig functions that have tiny differences on many platforms. I’ve managed to get our code working on iOS (all devices), MacOS 32 and 64, and Linux x32 (x64 had a few issues), but it was a long non-trivial process.
  • It took us about half a year post launch to iron out all the issues from the replay feature.
  • If you work on the game code with several people, it gets even harder to educated them about writing synchronous code. Here, it really helps to have MS/PhD colleagues though 🙂
Replay Pros
And here is the reasons why you would want it anyways:
  • Although the replay feature itself is somewhat hidden in TowerMadness (we could’ve done a better job at integrating it), it still has a profound impact on many aspects of the game and game design. Additionally, players that know about it love it.
  • The replay feature is a strong driver for the competitiveness of our players. We’ve got people that battle it out on our leaderboards for days.
  • Replays can be a great help balancing the games, by looking at how players play the game, and detect any abuses, too strong or too weak game mechanics.
  • Replays enable cheat-proof competitions. However, eventually these didn’t really matter, and didn’t drive the sales any more than the leaderboards already did, so we abandoned them.
Was it worth it?
Now, eventually it all comes down to these questions: Was the replay feature worth it’s time and effort, and would I implement it again?
I must confess I am a little bit biased here, because I’m also running a broadcasting service for Warcraft 3, Waaagh!TV, where we take the replay data and broadcast it to thousands of viewers in real-time. As such, I probably value this kind of technology a little higher than it is to the average person.
However, I do believe that the replay feature has an impact on the success of TowerMadness. It’s one of the most competitive Tower Defense games on the AppStore, and the competitiveness is amplified by giving the players more than just a score they have to beat.
Replays are also a valuable tool for the development process, finding bugs and assist the game design by helping you understand how players play your game.
As such, I would say that the feature was definitely worth it for us, considering our focus was on creating a competitive game.
Conclusion
Wow, that was a pretty long post. I hope it didn’t bore you guys too much, in case anyone actually even reads all the way down to here. Next week I will write another tech retrospective, probably about our 3D “engine”.