It's no secret that open source now dominates big data infrastructure. From Kubernetes to Hadoop to MongoDB, "No dominant platform-level software infrastructure has emerged in the last ten years in closed-source, proprietary form," as Cloudera chief strategy officer Mike Olson reminded us.
Not only is this true of data infrastructure, it's also increasingly true of the languages we use to query and analyze data. Though proprietary data-analysis languages like Matlab and SAS arguably were doing big data before big data was cool, they're plummeting in popularity, according to new rankings from IEEE.
In short, abandon hope all ye proprietary programming languages who enter here.
Open source rises, proprietary falls
Given the ever-increasing importance of developers, it should surprise no one that open source data infrastructure and programming languages should be ascendant. Open source is, after all, the lingua franca of development.
SEE Will Go give Java a run for its money? (TechRepublic)
Even so, the amount of movement between language rankings in the last two years is pronounced and, apparently, irreversible.
Looking at 2014 versus 2016, the IEEE rankings, which pull data from diverse sources like GitHub repositories, Stack Overflow mentions, and more, show a rise in Go and R that are as impressive as the fall of Matlab. R, for its part, climbed from #9 (2014) to #5 (2016), driven by a 46% increase in Stack Overflow questions, according to IEEE's Nicholas Diakopoulos, as developers seek to better understand how to put it to use, as well as a boom in scholarly articles mentioning R. (IEEE heavily indexes scholarly publications.)
Even more impressive than R, however, is Go, the open source language first released by Google. Based in large measure on a 5X boom in active GitHub repositories defaulting to Go as their primary language, developers have gone gaga for Go. Go may even give the venerable Java a run for its money, given developers' propensity to use it to build cloud applications.
At the same time, as Diakopoulos points out, "Contrary to the substantial gains in the rankings seen by open source languages such as Go, Julia, R, and Scala, proprietary data-analysis languages such as Matlab and SAS have seen a drop-off." To wit, Matlab has dropped four places over the past two years and SAS has plummeted seven.
SEE For data scientists, the big money is in open source (TechRepublic)
Redmonk analyst Stephen O'Grady uncovered the same general trends, though the movement in places is less dramatic. Redmonk's rankings see Matlab fall from #16 in 2014 to #17 in 2015 and #18 in 2016. (SAS doesn't show up in the top-20 languages.) At the same time, R has climbed from #13 in 2014 to #12 in 2016 while Go jumped from #21 in 2014 to #15 in 2016. The different trend lines look like this, as O'Grady paints:
In sum, not a great time to be anything other than open source in big data land.
A glimmer of hope?
Of course, language popularity rankings don't tell the full story. For example, for all R's growth, its popularity increase owes more to the academic crowd than the enterprise crowd (though it's clearly rising with both populations).
As for Matlab and SAS, a eulogy is probably premature. Diakopoulos stresses that "both of those languages are still growing." The problem, as he continues, is that "they're not growing as fast as some of the languages that are displacing them." In a world defined by the three Vs of volume, variety, and velocity, such slow growth may be equivalent to a death knell.
The cure, of course, is open source.
No, open source won't necessarily pay the bills at SAS, but it certainly seems to be helping to revive Microsoft's fortunes, after having bought Revolution Analytics, a primary developer behind R.
The key is to open source the core tools needed to make developers productive, and then figure out the complements (proprietary or otherwise) that enterprises will pay for. For anyone looking to make a mint off proprietary programming languages, these new IEEE rankings are a reminder that they don't have much time to figure out a different strategy.
- Will Go give Java a run for its money? (TechRepublic)
- Apache Spark rises to become most active open source project in big data (TechRepublic)
- NoSQL keeps rising, but relational databases still dominate big data (TechRepublic)
- Election Tech: How big data pioneers use open source technology to win elections (TechRepublic)
- For data scientists, the big money is in open source (TechRepublic)
Matt is currently head of the developer ecosystem at Adobe. The views expressed are his own, not those of his employer.
Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.