An initiative at the Debian Project to provide reproducible builds of binary packages available in the repository has this month reached 83.5% completion. Although the sources of these packages have been open for some time, individual compilation would most often generate something other than a bit-for-bit identical copy of the distributed binary. This initiative seeks to close that gap, and provides tools to audit the changes between varying binaries.
Why is this important?
The ability to identically reproduce builds is of increasing importance and visibility following concerns that arose from the disclosure of global surveillance activity, and the integrity of both the sources and compilers used to create distributable binaries. Bitcoin users have voiced concerns about malicious binaries, which led to the development of Gitian in 2011 to verify the integrity of those binaries. This initiative will allow any user to confirm that the binaries distributed in the Debian package repository are built from an unaltered source package, free of any interference at build time.
We often speak as if open source software can't contain backdoors or malware because its source code is "published", rendering any potentially malicious code visible. But real-world software release processes have major transparency gaps that aren't addressed by most existing open source development practices.
This initiative for reproducible builds sheds much-needed light on the least transparent part of this process, but the scope is limited to only the source local to the packaged version for this Linux distribution. The way that reproducibility is achieved is primarily by developing a toolchain and automated build process that regulates the behavior of compilers and normalizes variables, such as timestamps, paths, and user names, and inhibits randomness at compilation time, such as program assets being evaluated in a different order.
Between project developers, package maintainers, compilers, and end users, a number of variables exist in which some manner of issue can arise, having an unintended consequence of introducing a vulnerability that may not be readily apparent. A situation in which encryption keys were not adequately random, and required regeneration was the result of a change to initialization in OpenSSL by a Debian package maintainer. This initiative for reproducible builds does not provide any affordance for changes between the package and the published project source — these changes between the two varying sources will still need to be audited by the user.
What work remains on this project?
The project is able to verify 83.5% of packages in main can be rebuilt reproducibly — this comes after slightly over a year of work. At present, many of the 2,628 packages that are not yet reproducible have generated timestamps (such as in C++ macros, generated documentation, GZIP headers, etc.), which are not consistent across platforms. Less common are issues regarding randomness introduced at compile time or in program assets.
Issues with a special focus on the Project Contribute page include work on preventing automated documentation utilities from writing timestamps, and preventing timestamps from being generated in PHP registry files. Presently, portable executable (Windows) binaries are not yet reproducible for the same reason.
When will all packages have reproducible builds?
Reproducible builds will not be ready for Debian 8 ("Jessie"), which started the development freeze in anticipation of release on November 5, 2014. The first release candidate was released on January 26, 2015, with full release being speculated as March 2015. The team behind this initiative feels that this could be a release goal for Debian 9 ("Stretch"), with a full proposal to be submitted after the full release of Jessie.
What's your view?
Is binary reproducibility a security priority for your organization? Do you think other distributions should start similar initiatives for their package repositories, or should people this concerned with security just compile their own programs? Let us know your thoughts in the comments.
- Why open source development is getting more secure
- The Ghost security hole perfectly illustrates the efficiency of open source
- 10 best antimalware products of 2014, according to AV-TEST
- Web inventor Berners-Lee: The hidden cost of mass surveillance
- Security and Privacy: New Challenges (ZDNet/TechRepublic special feature)
- Research: 59% expect IT security to be more secure in 2015 (Tech Pro Research)
Disclaimer: TechRepublic, ZDNet, and Tech Pro Research are CBS Interactive properties.
James Sanders is a Tokyo-based programmer and technology journalist. Since 2013, he has been a regular contributor to TechRepublic and Tech Pro Research.