The Evolution of Zynga's zCloud: Interview with CTO of Infrastructure, Allan Leinwand

Rick Freedman's extensive interview with Allan Leinwand, the CTO of Infrastructure for Zynga, yields important insights for successfully implementing a hybrid cloud infrastructure on a large scale.

Zynga's gotten a lot of press lately - its own IPO has been successful, its connection to the highly-anticipated Facebook IPO has raised its profile, and it's made some announcements recently that have intrigued the gaming and social media universe. Back in February, Zynga announced the evolution of zCloud, a hybrid cloud composed of Zynga-owned infrastructure and public cloud offerings like Amazon's Web Services, integrated by common tools and management capabilities to position Zynga for its next round of hyper-growth. On March 1, Zynga announced the Zynga Platform, designed to enable players to find new ways to play, and more people to play with. As part of the Platform's initial roll out, Zynga will debut the beta release of, the company's new destination for social games. In addition to serving up popular Zynga games, will let players discover and play social games created by third party game developers. These Platform partners will use the Zynga Platform to reach new audiences and make their games even more social. These announcements have created a lot of buzz in the business and technical press, with coverage ranging from the New York Times to Wired magazine.

I recently had a chance to sit down with Zynga's CTO of Infrastructure Allan Leinwand, fresh from his presentation at the Cloud Connect Conference in Santa Clara, CA., to discuss in depth the technologies and strategies that are driving these new developments at Zynga.

TechRepublic: Rather than starting from inside Zynga, let's start with the customer: I'm a gamer; I pick up my iPad to play a Zynga game -- how do I navigate through the network to get to my game? Allan: Let's say you're playing Words with Friends. You access your game on your end device, whether it's a browser, phone, or tablet; you've downloaded that game from the Android marketplace or iTunes. If it's a Facebook game, it's a Flash app that comes down to your browser. That client will begin to communicate back to our zCloud infrastructure, and make changes in the gameboard. On Words with Friends, for example, you're changing the tiles on a gameboard, as we say; on more graphic games like Farmville or Mafia Wars, those are live surfaces that are constantly changing, and due to the actions of other players, the environment is constantly changing. You could be visiting your own gameboard back in our infrastructure, or visiting other people's gameboards as well. TechRepublic: My understanding is that you started with the idea that you'd use public cloud infrastructure, in your case, Amazon Web Services (AWS) to try out and prove a game, and then, when that game achieved critical mass, you'd migrate it to your own infrastructure. Is that accurate, and has the zCloud idea been a fundamental strategic shift? Allan: We've evolved quite a bit over the last three years. Back when we started, we launched our games with a hosting company; they set up a rack of servers; we provisioned our software in the back end; players started to access our games; and our developers created new games with new mechanics and new features. We were applying a typical Web 2.0 co-lo (co-location, or hosting) model. We quickly found that that wasn't really a good fit for our business model. We found that players were discovering real moments of joy with our games. We had developers creating new games all over the globe, and we quickly outpaced our cooling -- outpaced our ability to get new gamespace, our ability to get servers provisioned. In short, we ended up running out of space, time, and power. The real inflection point for that was back in 2009 with Farmville. The best way to think about it was that we were growing at a pace and capacity we couldn't predict or control. Farmville went from zero to ten million daily players in six weeks, and then grew to 25 million daily active players in the first five months. TechRepublic: Didn't this scare you? It's a great business moment but from a technical perspective it's got to keep you up at night? Allan: The best way to put it is that our main concern was keeping the infrastructure out of the way of the players. We were building great new games, adding features that customers loved, and our focus was -- how do we make sure that our infrastructure doesn't get in the way of people having fun on Farmville? How do we build the infrastructure capacity so we can keep growing? So we reached this inflection point back in '09, and we decided we were going to stop trying to  match that growth curve with our own co-location-based infrastructure, and we were going to move out to the public cloud. We decided to move Farmville out to Amazon Web Services, and we grew our Amazon cloud and started to introduce and grow our newer games there as well. Amazon and the public cloud became really important for us.

We liked the management tools that were available and the way that they scaled. We kept our private infrastructure -- we still had servers and co-lo's -- but we were using the capabilities and flexibilities that Amazon provided more than anything else. The way we think about it is that public cloud really revolutionized the way we managed our infrastructure. We continued to grow and scale with AWS, and then we launched CityVille, which grew even faster than Farmville. We really enjoyed the benefits of using the public cloud throughout that time. At the same time, we came to the realization that we were renting what we could own. The public cloud isn't your own infrastructure; it isn't something you can own and operate in your own way, and it isn't capital equipment, so it was an operating expense.

TechRepublic: This is an important point, because I think a lot of companies that struggle with the public / private cloud question have to think about the financial as well as the operating implications of their decision. Allan: I think the decision factor for folks, is, maybe you don't understand the workload of the application, or you don't understand how successful the application is going to be in the marketplace, you may want to co-lo or rent, but once you understand the workload and the business model, you need to control the infrastructure in a way that makes sense for your business. If your growth curve is a near vertical line, it's really hard to get ahead of that. Once you get a handle on the growth curve of your app workload, you can start to partner with vendors, and start to build out your own facilities, and so we started to do that, and we called it zCloud.

Our initial goal on zCloud was to make it mirror what we had with Amazon Web Services. We wanted our game studios, developers, our players, the folks who really banged on our infrastructure, to not have to worry about -- are they using Amazon or are they using zCloud? We wanted them to ignore the infrastructure and just play our games in a way that made them feel good. I'd hate to think that when you're playing Words with Friends, you're thinking about our server colony. I want to think that if you're playing Words with Friends, you're thinking about playing that triple-word score! I want to know that, when you're thinking about buying the extension into the next area of CastleVille, you think only about visiting your friends and sharing player experiences. It's socially accessible and fun, and not about our server capacity. Infrastructure should be a non-event for our players and a non-event for our game developers. That's our job as an infrastructure team, to worry about that and scale appropriately.

So, to go back to this period around 2010, we have zCloud that we've launched internally, but we're continuing to launch our games in the Amazon Web Services cloud. Once we launched games into Amazon, and we understood the slope of the (uptake and workload) line, then we could do the right capacity models and the right infrastructure planning, so we could make decisions about when we could begin to bring those games back under zCloud. As long as we could track the slope of the line and build out the appropriate capacity and infrastructure under zCloud, we were ready to start moving games back under zCloud. So, at the beginning of 2011, we had 20% of our daily active players playing in zCloud, with the remaining 80% still in Amazon. We were interconnecting these key pieces of infrastructure, with direct connections between Amazon and zCloud, and we'd begun to build out zCloud facilities on the East Coast and the West Coast. We'd come to an architectural model of launching in Amazon, using all the benefits of the public cloud, and leveraging the ability to reverse back to the private cloud where it made sense for the business and for capacity.

Becoming the world's largest hybrid cloud --> Page 2

By Rick Freedman

Rick Freedman is the author of three books on IT consulting, including "The IT Consultant." Rick is an independent consultant and trainer, working, through his company Consulting Strategies Inc., to help agile teams and organizations understand agile...