Hadley Wickham is part of a growing movement of statisticians and data scientists preaching the evangelism of R as a convenient, easy-to-use tool for data analysis.
Wickham is the brains behind the popular dplyr package, which makes it easier to manipulate data. He’s developed or co-developed others including tibble, ggplot2, glue, and pillar.
Many are used widely by companies like The New York Times, Facebook, and Google. His fans have even dubbed his creations the tidyverse.
SEE: Six in-demand programming languages: getting started (free PDF) (TechRepublic)
“Initially, it was a language that was primarily used by statisticians, so the assumption was that people using R had a Ph.D. in statistics,” Wickham said. “With the rise of data science, the popularity of R has massively expanded. Lots of people from many different backgrounds and many different domains are using it now to figure out what’s going on with your data.”
“The thing that really drew me to R was that flexibility and power it gives you to really wrestle with your data and ask it questions and to figure out what’s going on in a very fluid and interactive way,” he added.
Programming runs in Wickham’s blood, as his father and sister have Ph.D.s in statistics. He got his start with the R language 15 years ago when he was an undergraduate at the University of Auckland, where R was created by the statisticians Ross Ihaka and Robert Gentleman in 1993.
Wickham is now the chief scientist at RStudio and serves as an adjunct professor of statistics at the University of Auckland, Stanford University and Rice University. His work with R has turned him into somewhat of a celebrity in the data science field, with many of his fans flooding forums with gratitude for his packages.
His tools have simplified the somewhat arcane code needed to handle things like data aggregation and plotting. This has made R applicable to almost any industry in need of a way to organize data.
Wickham said he was honored to see people at government agencies like the Food and Drug Administration and companies like FiveThirtyEight and Twitter used his packages. He highlighted R’s adoption by pharmaceutical companies, which use it to design and analyze the results of clinical trials and other parts of the drug discovery pipeline.
“A bunch of people in finance use it, as well as insurance and academia. If you’re involved in any discipline that collects data, it works. It’s getting more popular in economics and lots of biologists and ecologists use it. It’s helpful for people who do not have a traditional quantitative background but are now having to wrestle with data. Journalists are a good example,” he said.
“Part of it is that it was designed by statisticians. At the very heart of the language is designed specifically for the types of problems you encounter doing data analysis.”
Wickham, a native of Hamilton, New Zealand, has been working on databases since he was 15, developing Microsoft Access Databases.
His ggplot2 package—one of the most popular—has been downloaded by millions who praise its ability to help manage data visualization. The goal of so many of his packages is to remove the hard part and make it easier for more people to have access to tools that simplify their data.
His goal for the future is to continue the expansion of R across the world to diversify the pool of people who use it. A downside, he said, is that it can be difficult to use R without speaking English.
Groups are now translating some of his books about R into Spanish and other languages so that more people can gain a foothold into understanding it.
“One of the things I’m interested in is making sure that everyone who wants to use R can use R. I went to the Latin-R conference in Chile and I asked myself, ‘How do we help people whose first language is not English use R?'” he said.
“So a community in Latin America recently translated my book ‘R for Data Science’ into Spanish and one of the neat things they did is they also translated some of the data sets, so that the names of the data sets and the names of the variables are in Spanish as well.”
He hopes there can be more interaction and exploration between R and some of the other competing languages like SQL and Python. The idea, he said, should be to simplify it so that anyone can use these tools for any kind of data. He joked that he even scraped data from his yoga class website and was able to play around with it using R.
There are many people who are not programmers, statisticians or mathematicians but are forced to handle data.
“How do we help those people learn R through sort of a combination of better tools that are better to understand and easier to learn and better teaching and better resources,” he said.
The fairly recent popularization of the R language has made its user base one of the most diverse, with communities across the world and a particularly large community of women, who have dubbed themselves R-Ladies.
“What’s special about the R community is the R-ladies community, which is a relatively recent thing. A bunch of meetups around the world are now aimed at women and other gender minorities,” he said.
“It’s really been having an impact on the gender diversity of the R community.”