Western entrepreneurs still haven't figured out China. For most, the problem is getting China to pay for software. The harder problem, however, is building software that can handle China's tremendous scale.
There are scattered examples of success, though. One is Alluxio (formerly Tachyon), which I detailed recently in its efforts to help China's leading online travel site, Qunar, boost HDFS performance by 15X. Alluxio CEO and founder, Haoyuan Li, recently returned from China, and I caught up with him to better understand the big data infrastructure market there, as China looks to spend $370 million to double its data center capacity in order to serve 710 million internet users.
This could get loud.
Open sourcing China
One of the most interesting things about big data is that all of the best data infrastructure is open source. As Cloudera co-founder Mike Olson has made clear, "No dominant platform-level software infrastructure has emerged in the last ten years in closed-source, proprietary form." This is particularly true in the world of data infrastructure.
Historically, China would have benefited from such bounty but in the area of big data, China is not merely consuming the West's best software: It's open sourcing its own. Baidu, for example, has just announced the open sourcing of its machine learning platform, PaddlePaddle, under an Apache license. According to Li, "This is as significant as when Google open sourced its machine learning platform, Tensorflow."
Baidu's action suggests a shift in how China thinks about software. In December 2014, China's Ministry of Industry and Information Technology (MIIT) declared its support for OpenStack for state-owned enterprises. Not long after, Tencent embraced the Open Daylight Foundation's SDN instead of developing its own proprietary distributed cluster SDN controller, as Neela Jacques uncovered. Across China, similar efforts to use, and increasingly contribute, open source code have flourished.
This is critical because, as Li told me, China's scale puts all software to the ultimate test.
Hitting China scale
As Li stressed, "Many of our largest production deployments are in China, and that's on purpose." That "purpose" is to stress-test Alluxio's software under the most demanding situations.
For example, Baidu has started speaking publicly about the open source infrastructure powering their driverless car initiative. Huawei, for its part, actively promotes its FusionInsight product, which heavily depends on a variety of open source technologies (to which it increasingly contributes). Tencent offers a range of open source infrastructure projects, covering everything from data warehousing to mobile network acceleration.
These represent China's efforts to open up. But as my conversation with Li makes clear, Western companies (and the open source projects they back) need to be promoting their code in China, too—not only for potential commercial gain, but also to encourage China's best enterprises to stress-test one's code, even as we encourage China's best developers to adopt it. That's a big reason MongoDB has worked closely with a variety of organizations in China, winning plaudits from China's largest car-hailing service, Kuaidi, among others.
Because, if you can meet China's scale demands, everything else is easy.
- The one way you can make your company run more like Facebook (TechRepublic)
- The world is swimming in open source, but only one company is making any money (TechRepublic)
- Why AWS Lambda could be the worst thing to happen to open source (TechRepublic)
- The center of gravity in big data is shifting to Spark (TechRepublic)
- How one e-commerce giant uses microservices and open source to scale like crazy (TechRepublic)
Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.