The story starts with Stephen M. Cabrinety, the Stanford University Libraries, and NIST’s National Software Reference Library (NSRL). Cabrinety collected more than 50,000 pieces of commercial software and nearly 300 functioning microcomputer systems–some dating back to the mid-1980s.
Stanford University Libraries acquired Cabrinety’s collection in 2009, fourteen years after Cabrinety died from Hodgkin’s lymphoma. The acquisition and preservation of the collection had been a dream of his.
One has to wonder why the NIST and NSRL became involved. Truth be told, it’s their job. The agency has been tasked with collecting, archiving, and making verifiable forensic information on individual pieces of software available to public and private organizations. In fact, NSRL is likely the largest publicly-held repository of digital software in the world. The NIST press release Digital Forensics Rescues Retro Video Games and Software explains why the collection is important:
“NIST maintains this collection not to preserve cultural history but to provide a forensic tool for law enforcement and national security investigators. NIST runs every file in the NSRL through a hashing algorithm that generates a unique digital fingerprint for each file.”
To make that happen, the personnel at NSRL:
- Ensure the proper storage of the collected software packages: Software is purchased or donated by software manufacturers and other organizations. These packages (physical media and purely electronic) include multiple versions of various operating systems, database management systems, utilities, graphics images, component libraries, etc.
- Create the NSRL Database: The database contains detailed information about the files that make up the packages listed above.
- Provide adequate access: The goal is to enable easy access to the NSRL’s collection of software packages by interested researchers and organizations–public and private.
- Maintain the Reference Data Set (RDS): This data set contains signatures and identifying information, but not the software files. The data includes manufacturer name, operating system information, product information, application type, and file storage information.
The engineers at NSRL then do something unique: They create cryptographic hash values (MD5 and SHA-1) of each file’s content as part of the RDS. This enables files to be identified even if the file name has been altered.
Figure A shows the process steps involved in archiving physical and digital software packages.
Law enforcement, government, and private organizations can then review computer files by matching them to the profiles in the RDS. This alleviates significant effort when determining which files are important as evidence on computers or file systems that have been seized as part of criminal investigations. From the NSRL website: “The RDS is a collection of digital signatures of known, traceable software applications.”
A real-world example
To exemplify the RDS’s usefulness, the NIST press release mentions how the NSRL helped the FBI with its investigation into the tragic disappearance of Malaysia Airlines flight MH370 in 2014. “They (FBI) wanted every hash of every file associated with every flight simulator we had,” said Doug White, the NIST computer scientist who runs the NSRL. “All the maps. All the routes. They wanted every flight path the pilot might have practiced on, so they could figure out where he might have gone.”
SEE: The disappearance of Malaysia Flight 370 (CBS News)
Additional uses for the NSRL RDS
Although the NSRL RDS was designed primarily to aid digital forensic examiners in their investigations, computer security professionals and cultural heritage communities are finding the NSRL platform useful.
Computer security professionals use the NSRL to:
- Find at-risk software installed on a computer
- Validate files using originals that are known to be safe
As to the cultural heritage communities, the same methodologies NSRL personnel use to protect digital evidence are being adopted by the cultural preservationists to protect software in general.
Something not often thought about is how a digital forensic scientist working on a criminal case knows if a particular software application having thousands of lines of code has been altered to hide an incriminating piece of evidence. Using NSRL tools, investigators can quickly know if the code has been doctored by comparing the hash from the suspect code against the RDS hash of the original and pristine code–saving time and effort.