Open source scale and sprawl: This compliance company has a new way to tackle it

Commentary: Open source has never been more popular, which makes keeping track of it in an M&A deal so much harder than it used to be.

opensourceistock-479493570boygovideo.jpg

Image: boygovideo, Getty Images/iStockphoto

There's no such thing as 100% proprietary software anymore. Virtually all software, whatever its end-user license, includes open source code under a wide variety of license. Open source is everywhere, in everything. Small wonder, then, that code scans have become a critical component of M&A, procurement, and more, as companies seek to understand what risks (and opportunities) they're assuming when they bring a company and its software into theirs. 

SEE: Developer code reviews: 4 mistakes to avoid (free PDF) (TechRepublic)

The very omnipresence of open source, however, creates problems for those who would hope to analyze its inclusion in a given piece of software. Every minute, roughly 50 projects get added to GitHub (nearly one per second), with over 100 million in existence. Tracking that code becomes a serious problem of scale, one that FossID, an open source compliance software vendor, believes it's uniquely positioned to tackle.

A knowledge base is...dumb

The standard approach to open source compliance tools is to build a knowledge base. That is, these tools take the most widely used open source projects, spend a fair amount of effort to clean up the data about them, and create a knowledge base, which is pushed into a relational database. According to Oskar Swirtun, co-founder and CEO of FossID, this nets such vendors two to 10 million projects in their knowledge bases. 

SEE: How to build a successful developer career (free PDF) (TechRepublic)

While such numbers may sound impressive, they're failing to keep pace with the explosive growth of open source, said Swirtun, particularly given their manual approach to data cleaning. "Open source is growing so fast that to do any manual work when you collect information about open source projects is impossible," he said in an interview.

What's needed instead is an automated approach, Swirtun argued, one that needn't structure data for a relational model. FossID did this, creating its own, purpose-built NoSQL database to allow ingestion of data from a broad array of unstructured sources like Stack Overflow, and "huge amounts of projects" from GitHub or any open source repository. Today the company holds information on approximately 35 million projects. In all, this querying of over 100 different sources yields 2 petabytes of data, which the company compresses to just a few terabytes, enabling fast scan rate performance of 70 files per second on average.

In other words, FossID aims to be the "Google of open source compliance," rather than Yahoo (which used to manually compile a list of all available websites). 

This approach has proven helpful for uncovering code needles in the proverbial haystack. For example, Microsoft posts a great deal of .NET reference code, and Oracle publishes its own Java reference code. Often, Swirtun noted, developers will take these snippets of reference code and include them in their applications. The hitch? This code is often proprietary. FossID indexes and surfaces it all in ways that traditional knowledge bases simply aren't equipped to do.

Software isn't going away anytime soon

Though FossID's customers used to hail primarily from the US, Swirtun said that's changed. Asia-Pacific and Europe are both gathering steam, largely because software has become the critical asset in any given business:

The biggest assets that people are buying today is actually company software. So a prospective buyer needs to know more about that asset. In a compliance audit you get a lot of information, not just about the code, but also about how mature the organization is, how well they work [in part based on] what kind of open source they use. This says a lot about the maturity of the organization. Audits are therefore becoming standard.

Swirtun said that though he's never seen a deal scuppered because of an audit, he regularly sees it affect a company's negotiating position. Knowledge is power, and knowledge about how a company uses open source (and proprietary, as mentioned) software is valuable in an M&A deal and beyond. 

As for those thinking they needn't bother, that they're focused on buying a proprietary asset, well...think again. "We have never seen software that didn't include open source. I think it would have to be some company that is ancient [to not be using open source]. I don't see how you build a company today without open source."

Disclosure: I work for AWS, but the views expressed here are mine and don't represent those of my employer.

Also see