TowerMadness Retrospective Part 3

Hey everyone,
as promised last time, I’ll write a little bit about the server side of TowerMadness, and the design decisions we faced. This will be very non-technical.
Preface
I should probably build a little bit of credibility before I start to write about all this server stuff. I’ve been working with web technology forever. I love the technical side of it. My main achievement in this area prior to the TowerMadness server infrastructure is a little project called Waaagh!TV, which I’ve created in 2004, and it has since streamed tens of thousands of Warcraft 3 games in real-time to around a million of viewer in total.
That said, TowerMadness was a special challenge. At the time, there weren’t any proven frameworks, like OpenFeint or Plus+ yet, so we had no other choice than to do it ourselves (although we probably would’ve done it ourselves anyways). We had a lot of special requirements, too. Competitiveness was very important to us, and it’s one of the main selling factors of TowerMadness. Replays for every game are analyzed and stored online for other players to watch. We also had DLC maps long before IAPs, although we’re now back to just updating the game, which is also an effective marketing tool. We had real-world price competitions, but after they generated little publicity we stopped doing them.
The other factor was that we had no idea how well TowerMadness would sell. At that time, there were games that sold 250k+ copies, which seemed like an unreachable number to us. Because of our focus on competitiveness, we asked ourselves an important question: What if it is successful and it gets high up in the ranks? It would be terrible if at that short period of time where we have this huge amount of exposure, the server system can’t handle the load and hence severely taints the user experience.
As you can see, we had a lot of things planned, and fooled around a lot. As such, we wanted a flexible and scalable online system.
TowerMadness Server System
We initially started out with a Dreamhost hosted php+mysql server in early beta. It was very cheap, but that’s it. It had many downtimes, high latency, and it seemed impossible to scale, so we gave it a no go.
We then investigated many of the upcoming very popular cloud server systems, EC2, AppEngine, Rackspace. We picked AppEngine mostly because of the ease of use, the very lovely free tier, and the fine-grained scaling (money and requests wise), all without manual intervention as would be required with EC2 and Rackspace at the time. Seemed like a perfect solution for a startup that doesn’t know yet what it’ll score.
Long story short, AppEngine has proven to be a great decision. We’ve scaled up from only a few requests a minute in beta to 30 requests per second, which is millions of requests a day. Once in a while I needed to improve the scalability of the code, which was mostly due to not knowing how the next order of magnitude of requests will pressure the datastore and CPU, and hence making uneducated mistakes. At no time, however, was our service seriously affected by any prolonged service interruptions (apart from occasional and very rare AppEngine maintenances).
Word of caution
AppEngine is not a classical LAMP system, with long request times and a full-featured MySQL database. The datastore is very scalability oriented, and it is really not much more than a hash-map that maps keys to automatically serialized objects. The query functionality is very limited.
In my eyes, it simply tries to prevent you from doing any unscalable things (although you can still do those :-). This is the reason why the backing database is very limited. Personally I learned a lot about scaling into the million requests a day realm by understanding how AppEngine works. There is a lot of interesting things I learned that I should probably share on this blog.
Nevertheless, as a summary I should say that I love AppEngine for what it is, a cheap, efficient and very scalable solution for many server problems. It was the perfect choice for TowerMadness. But it’s not for everyone, and not for every problem. I think the goal to a successful Web infrastructure is to pick whatever gets the job done best, and it takes a lot of time and experience to find a good solution. We’re by now running several EC2 nodes, many AppEngine apps, and a few dedicated servers, and we’re still learning our ways through this.
Game Accounts
Another very interesting problem that is related to the server question is that of user authentication. As a game developer, you want an easy to use and reliable way of uniquely identifying your users. This enables a lot of interesting features, like online high scores, cloud storage of user progress and settings, sharing friend scores in social networks. However, this can turn out to be pretty tricky.
In TowerMadness, we used a user-entered username plus the device UDID to uniquely identify users. We also wanted to support multiple accounts per device, a decision that made our life harder, while it added little to the game, as only a very very small fraction of the users use this feature. We then later on added GameCenter. This caused a lot of trouble and still creates issues now and then.
All in all, I’m not so happy with how the whole situation turned out. The solution we have in TowerMadness is far from ideal, and I would definitely change it if I would do it again.
However, the whole user authentication is still an unresolved problem, so here is a quick overview of the most popular techniques that I’m aware of:
Game Center:
Apple’s built in iOS social network for games.
+ uniquely identifies a user
+ built-in authentication
+ provides a social network of friends
– GameCenter adoption rate not great (probably because the signup and authentication is not great yet)
– GameCenter API has issues
– GameCenter does not provide secure authentication for a third-party server (such as an OAuth token, so storing private information is a bad idea. Your user id is public, so someone can easily mess with your data.
– locked into the iOS platform (hopefully MacOS soon)
UDID:
Using the Device’s unique ID to identify a user.
+ no signup, no question to the user, very lightweight
– no secure authentication
– uniquely identifies device, not user (can’t easily transfer to new device)
Email+password system:
Asking for the user email and securing it with a password
+ uniquely identifies users across devices
+ can be combined with a newsletter signup
+ people don’t forget their email as easily as a random user name
– people need to sign up with their emails, which can be akward
– you need to manage password recovery, password changing, etc
Facebook:
Using the facebook ID of the user and valid authentication tokens to uniquely identify the user and integrate into the facebook platform.
+ tight viral marketing integration
+ secure authentication via OAuth
+ Everyone has an account
+ people don’t forget the login, you don’t need to take care of account management
– some people are allergic to facebook logins/signups
Twitter:
Same as facebook, except:
+ in my experience fewer allergies to typing in twitter credentials (probably because less private information on twitter)
– fewer users, and hence less viral potential
Third Party:
Using a third party service like OpenFeint, AGON, Plus+, Scoreloop, Cloudcell.
+ usually easy to integrate and provides a lot of convenience (cloud storage, etc)
+ no worries about server side
+ usually secure authentication
– give control out of your hands. the more important the online business is for your app, the less attractive this is
– can go out of business (see AGON shutting down http://developer.agon-online.com/blog/)
– may not provide a social network/friend graph
– high fragmentation between OpenFeint, GameCenter, Plus+, Agon, etc.
– usually add a large footprint to your binary
As you can see from my unscientific and biased presentation, there is still no perfect solution for the online account system. However, my hopes are still for a better GameCenter API with secure server authentication, better user experience and MacOS support.
I really hope that one of the frameworks will eventually become ubiquitous, so that integrating social features (such as friend scores) is more effective, as it’s something really fun. I love comparing my scores to my friends, it’s so much more personal than fighting for rank #342523 against “MikeDevil147”.
But for now, it seems that either going with a third party framework, or rolling your own thing with multiple different authentications (Facebook, GameCenter, maybe Email+pass) seem to be the best solutions for a game like TowerMadness.
Conclusion
A scalable server side can be a very difficult problem and should never be underestimated. User authentication is also something that still needs to be solved in the long run.
As usual, let me know in the comments if you have any specific questions, I’d love to answer them.

4 comments on “TowerMadness Retrospective Part 3

  1. Great post. I'm looking into some similar things right now for Trainyard and some future projects.

    I'm leaning towards very passive authentication using just UDIDs, but of course that has a bunch of disadvantages, like you mentioned.

    I love the idea of Google App Engine, but I've read some posts that scared me away from it, such as http://www-cs-students.stanford.edu/~silver/gae.html

    I'm also tempted to use EC2, or a service that runs on EC2 like Heroku or PHPFog.

    Anyway, you kind of hinted at it at the bottom of the post, but I'm curious about what you'd use now if you had to recreate the backend for TowerMadness from scratch.

  2. Hey Matt, thanks for the feedback.

    I would def. choose AppEngine again. I would implement some things differently, like adding deferred log analysis instead of real-time analytics (because that's just very expensive).

    I know the link you posted, but I don't think it gives an accurate image of appengine. It really depends on what you want to do. For example, we've got a dynamic news hosting for all our apps, and I've recently optimized it so far that it costs us $0, even with a million requests a day. Latency is incredible (14ms average) and the last time we served a single error for that service was more than a month ago.

    But as I said in my post, the datastore is not a database. It's more like a filesystem. you can fetch files, do very simple queries, and that's it. If your app requires non-trivial searching, counting, or other queries, AppEngine may not be the best choice.

    EC2 is a lot more work to set up and get going, and it has a bunch of other issues, also depending on if you want to use RDS (amazon mysql hosting), mapreduce and other features.

    If you want to discuss your questions in more detail, feel free to drop me a mail, I'm glad to help!

  3. Ah, that clarifies things a lot, thanks for the info. I'm still in the investigative stages, so I don't have any more questions at the moment, but I'll definitely get in touch if I have more questions in the future.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

39,834 Spam Comments Blocked so far by Spam Free Wordpress

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>