Most GitHub repositories still don't carry a license. This is a problem. Matt Asay explains why.
Open source has never been more pervasive, driving tectonic shifts in mobile, cloud computing, and big data. And yet, open source also remains an afterthought for the legions of coders that flock to GitHub, as new data suggests.
GitHub has made significant efforts to get its developers to thoughtfully license their software, but most developers still don't bother. This can create problems for those that choose to deploy this seemingly unlicensed software.
But rather than eschew the mountains of code being released on GitHub, would-be adopters of GitHub code need to start asking that code be licensed. It may be the only way to change the seemingly permanent shift toward completely open source.
In the beginning was Sourceforge
GitHub is a relative newcomer to source code hosting, but it has quickly become the only credible repository. Sourceforge was the first big code repository, followed by Codehaus, Microsoft's CodePlex, and Google Code, among others.
While CodePlex still shuffles about, Sourceforge gave in to Google Code years ago. Indeed, Google Code looked set to dominate code hosting for a long time, until GitHub hit overdrive. Recently, both Codehaus and Google Code shut down, rendered obsolete by GitHub.
Not that Google is blameless.
According to Andrew Binstock, Editor in Chief of Dr. Dobb's Journal, Google Code was once the industry's developer darling, but Google's missteps made it vulnerable to GitHub:
"When it first came out, [Google Code] was a mecca for new projects, seeded by the many projects of Google employees. It had a fairly low-tech UI, but it was fast and simple to operate. In addition, it had a variety of features that appealed especially to programmers. However, Google moved slowly in improving the site to recognize changing developer preferences. And the company did not update its peculiar UI. And it was less than helpful to user requests. All were crucial missteps."
GitHub, for its part, put developers first, which was arguably its "killer feature." By putting developers first, I mean that GitHub, more than any other code repository, put code first, as an excellent Wired article detailed. Rather than land a developer on some Overview page, GitHub immediately pushes developers into the code, and then makes it easy to fork that code.
But one of the potential pitfalls in this easy experience is that most GitHub developers continue to ignore open source.
The kids don't care
A shift to permissive open-source licensing has been in full gear for many years, as Redmonk analyst Donnie Berkholz captured, but GitHub takes this trend to new levels. Or, as free software luminary Glyn Moody declared, "the logical conclusion of the move to more 'permissive' licences [is] one that permits everything."
Which is precisely what GitHub's license data shows.
Two years ago, GitHub released data showing the typical open-source licenses chosen by the millions of projects it hosts--or, rather, the lack thereof.
A mere 14.9% of projects on GitHub in 2013 identified a license at all, as Neil McAllister uncovered. Two years later, things have improved, but not as much as GitHub's Ben Balter wants us to believe.
As can be seen in his chart below, GitHub's introduction of choosealicense.com and its license picker helped to reverse a long decline of licensed repositories (that is, code repositories that explicitly identify a license). But what can also be seen is a renewed slide toward licensing anarchy:
Balter said "the results exceeded even my highest open-source expectations." If so, he may need to upgrade his expectations.
After all, it's not as if unlicensed code is, well, unlicensed. All software, whether explicitly licensed or not, carries a copyright. That's just how copyright law works.
So, if all this "unlicensed" software actually carries a license, what are the rights of all the millions of developers using that code? Therein lies the problem.
Balter continued, "Developers use GitHub because they want to share their code with the world, and the data suggests that when the tools we use make it a little bit easier, developers do just that. When presented with the option, they choose to license, and they license very permissively."
But that's not really what the data says. The data shows that even when presented with the option to license, the vast majority of GitHub developers don't license their code under any particular license, open source or otherwise. Which means that GitHub remains a morass of software of dubious license parentage.
Should you use it, anyway?
Does this matter? Yes, it absolutely does. But should it keep you and your company from using GitHub software? No, it absolutely does not.
After all, as Berkholz noted, "As projects grow, they tend to sort out any licensing issues, likely because they get corporate users, professional developers, etc." The more popular code, in other words, will generally find its way to a mainstream license, because enough people will get involved that expect mainstream licensing behavior.
But there's another option, and that is to force the change. Would-be contributors or users of a GitHub project can and should ask the code creators to choose a license. It takes very little time and can save a lot of bother later. (Try going through a code review at time of company acquisition if you've elected to deploy a mass of dubiously licensed software in your products. It's not pretty.)
Simply raising the issue will be enough with most project sponsors to get them to choosealicense.com. They may not recognize the importance of doing so until you ask.
So ask. After all, it's your code, too--or can be, with the right license.