Open source powers AI, yet policymakers haven't seemed to notice

Commentary: Open source is driving AI policy without policymakers getting involved, argues a Brookings Institution briefing.

digitaltransformation.jpg

Image: iStock/metamorworks

"Open source software quietly affects nearly every issue in AI policy," wrote Alex Engler in a Brookings Institution briefing, yet this is barely discussed by government policymakers. This is a mistake, and it's one that crosses the political aisle. The Trump administration barely mentioned open source in its AI policies, while the Obama administration touted open source as driving AI innovation but stopped there. In Europe things are no better, with new regulations about AI skipping the topic of open source entirely. 

Given how prevalent open source has become in the artificial intelligence software that companies and governments use, policymakers would do well to pay attention, noted Engler. 

SEE: Artificial intelligence ethics policy (TechRepublic Premium)

Open source is powering AI innovation, but at what cost?

One reason open source is so heavily used in AI is that it increases innovation while significantly lowering the bar to productivity. According to Engler, "[W]ell-written open-source AI code significantly expands the capacity of the average data scientist, letting them use more current machine learning algorithms and functionality." Open source AI code gives data scientists high-powered tools without requiring them to become high-powered mathematicians. 

Open source also allows researchers to more easily replicate results that others have produced. "OSS [open source software] is most directly helpful to reproducible research because the same OSS is available to many different researchers," said Engler. 

All of this is great, and helps would-be AI developers to accomplish more. And yet there are problems with how AI open source code is growing, Engler noted. 

While OSS is often associated with community involvement and more distributed influence, Google and Facebook appear to be holding on tightly to their software. Despite being open-sourced in 2015, the overwhelming majority of the most prolific Tensorflow contributors are Google employees, and Google pays for administrative staff to run the project. Similarly, almost all of the core developers for PyTorch are Facebook employees. This isn't surprising, but it is noteworthy. Even in open sourcing them, Google and Facebook are not actually relinquishing any control over the development of these deep learning tools. 

This may be standard for how corporate open source often works, but it can have negative implications. "By making their tools the most common in industry and academia, Google and Facebook benefit from the public research conducted with those tools, and, further, they manifest a pipeline of data scientists and machine learning engineers trained in their systems," Engler stated. I've written about this before, detailing how big vendors increasingly use open source as an on-ramp to proprietary services. 

While this may not seem to matter, it points to potential landmines. "The apparent dominance of Tensorflow and PyTorch means that Google and Facebook have outsized influence in the development and common use of deep learning methods—one they may be reluctant to cede to consensus driven organizations" like standards bodies, argued Engler. It's not that these companies are necessarily nefarious, but for government policymakers, ceding control of such an important area of innovation may be short-sighted, Engler suggested. As he asked, "Are we comfortable with an AI world dependent on open source, but entirely corporate controlled, software?"

It's a good question, and it's one that policymakers would do well to try to answer.

Disclosure: I work for MongoDB, but the views expressed herein are wholly my own.

Also see