Open Source

Exponential growth of R's open source community threatens commercial competitors

With more than 2 million users and developers, how can proprietary vendors stand against the R programming language and software environment's open source community?

Image: iStock/mindscanner

Sure, you may be a shy, introverted quant nerd, but the R community waits with open arms to receive you. And share its bounty with you. So much so, in fact, that the total number of CRAN (Comprehensive R Package Network) packages keeps mushrooming roughly exponentially as the data science elite share at a frenetic pace.

Want to analyze cricket performances based on data from Cricsheet? There's a CRAN package for that. Or maybe it's more to your taste to use convenient functions for ensemble time series forecasts? There's a CRAN package for that, too. Indeed, as of this writing, there are 8,270 packages available for R, a number that has exploded from just a few hundred in 2006, as Andrie de Vries models.

Indeed, R is growing in power at such a rate that it's hard to imagine proprietary competitors being able to keep pace.

Hitting the accelerator on community

Actually, it's not hard to imagine. It's impossible. As Bob Muenchen details in a May 2015 StatsBlog post, "During 2014 alone, R added more functions/procs than SAS Institute has written in its entire history." SAS is, of course, the most dominant proprietary alternative to R, an open source programming language and software environment for statistical computing.

SEE: 10 of the coolest cloud programming languages (TechRepublic)

SAS generated over $3 billion in 2015 revenue. While companies make money around R, the primary driver of its growth in functionality isn't fueled by profit. It's paid for with community — a community that has accelerated innovation in R to such an extent that "R now contains 150 times as many commands as SAS," as Muenchen points out. Muenchen's article was written almost a year ago, and the divide has only grown since then.

The comparison between SAS and R commands isn't perfect, of course. Muenchen writes:

Of course SAS and R commands solve many of the same problems, they are certainly not perfectly equivalent. Some SAS procedures have many more options to control their output than R functions do, so one SAS procedure may be equivalent to many R functions. On the other hand, R functions can nest inside one another, creating nearly infinite combinations....While the comparison is far from perfect, it does provide an interesting perspective on the size and growth rate of R.

One way to illustrate this impressive growth is by looking at how fast new functionality has been added to R through CRAN packages, as de Vries does (Figure A).

Figure A

cranpackagesr.jpg
Image: Andrie de Vries

While this chart doesn't by itself indicate whether the "the contribution rate is steady, accelerating or decelerating," as de Vries highlights, it does indicate dramatic growth in R's utility. Ironically, one of the hardest tasks when using R is now finding the best packages for a particular task, given the volume of packages available.

Indeed, one of the unintended genius moves of the R core development team was to open up R development through the package system, as John Fox describes: "In a sense, the package system — like version control — is a technological solution to a social problem: how to invite, motivate, and coordinate the activity of hundreds of volunteers without overwhelming the resources of the Core team."

Popular with the data crowd

Today, R is tied with Python among programming languages used for data analytics, as a 2015 O'Reilly survey found. For its part, Python is now up to nearly 80,000 packages, as Python developer Matt Harrison pointed out to me.

As he states, "Great communities aid adoption."

We should expect this to continue. Last year, the R Consortium was formed to provide a way for companies to collaborate around R, beyond the less-commercial community work done by the R Foundation. At that time, Linux Foundation Executive Director Jim Zemlin captured the ethos that fuels R development: "Millions of data scientists and academic researchers use R language every day and want to collaborate with their peers to share visualization and analysis techniques."

Open source tends to work in disciplines with a broad talent pool of people with an interest in and aptitude for sharing code. This describes the R community quite well: a technical community with the ability to build R packages and a natural propensity to share that work. And at the rate that the R community shares, it's hard to see how any single commercial entity can hope to compete long term.

Also see

    About Matt Asay

    Matt Asay is a veteran technology columnist who has written for CNET, ReadWrite, and other tech media. Asay has also held a variety of executive roles with leading mobile and big data software companies.

    Editor's Picks