Two small AppEngine performance tidbits that saved Limbic a lot of money in the long run:
1) Queries on keys:
We’ve got a little query that fetches up to 100 games that have not been processed by our statistics bot yet. Because we have 10 statsbot instances running on 2 servers, it is called quite a lot. We’ve been using a very simple query to achieve the goal. But it used a lot of CPU and API CPU, actually being the second most CPU intensive function. I applied two small tweaks: First, I’m only fetching keys now (instead of the whole objects). Second, I doubled the size of the request from 100 to 200, returning twice as many games on a single request. The results? 5-10x reduction of CPU and API CPU for this handler. That’s a few $ a day 😀 Related AppEngine Documentation
2) Object Deletion:
Because of the increasing number of games we receive every second for TowerMadness and the storage costs on the AppEngine (tm-zero alone generates about 40 gb of data a month), we decided to delete games after a month. This also keeps the Meta-game fresh, because old strategies “leave” the pool as time progresses. Anyways, sounds like an easy deal to reduce long-term costs (and this really accumulates up), but the implementation turned out not to be that easy. The problem is, the AppEngine is really slow when it comes to deletions. When I say really slow, I don’t mean write slow. Writes are pretty slow, but deletes appear to be even more so. Another problem is the AppEngine remote_api latency. Initially, I was only able to delete about 0.5-1.5 games a second (contrast that with a peak game submission rate of about 10 games/second). It was also really expensive (the appengine dashboard was glowing when I ran the remote script :-). Pretty crazy for just “deleting old data”.
I’ve tried many many different optimizations (there’s real money on the line here). What really cut the deal was batching. I’m now accumulating a deletion list, and it is being flushed when it reaches the maximum size (which is about 100, because each game has up to 3 datastore entities, and 300 was the maximum deletion batch size that didn’t timeout to excessively for me). Secondly, and even more importantly, I cut down significantly on the latency by employing a small but great trick:
We’re running through the list of games with a key-index based query. We then fetch each of those games, check the date, and decide whether or not to delete it. If it is deleted, we also delete the stats object (for the homepage stats like kills, time, etc) and the game hash entry. Initially, I used a query here (select gamehash where hash=
), and later a little optimized gamehash.get_by_key_name( ). Both forced the remote api to download the object from the AppEngine though. But it turns out that you can just construct the key of an object if you know the keyname, via db.Key.from_path, like:
hash_key = db.Key.from_path(‘GameHash’, hash_key_name)
This gave an insane speedup, because now there is only one query, which is fetching the games. The other two queries were replaced, since we can get the keys we need for the db.delete function from the from_path function. This whole optimization cut the cost by an extreme amount, and actually boosted the deletion performance way beyond peak upload load, so we can now delete older games at the rate that they originally came in.
All in all, I’m extremely happy with the choice of the Google AppEngine for TowerMadness, but I guess you really have to spend some time to make it perform really good. But this way or another, TowerMadness Zero showed us that it scales better than anything I’ve seen before.