ERROR: NTFS FILE SYSTEM (0x00000024) and the ensuing Blue Screen of Death (BSOD) can mean many things, usually the most painful being lost data. Thankfully, operating and file systems have improved over the years, and BSODs are less common; however, they’re still around and causing angst.
After a file-system crash, think how much better it would be to see “You must restart your computer. But, all of your data has been saved.”
Help is on the way
A Massachusetts Institute of Technology (MIT) press release by Larry Hardesty stated researchers at the university will present a paper on October 4, 2015 at the ACM Symposium on Operating Systems Principles about file-system crashes. The presentation should be of interest to everyone who uses a computer. It’s about the world’s first file system that’s mathematically guaranteed not to lose track of data during a crash.
“Although the file system is slow by today’s standards, the techniques the researchers used to verify its performance can be extended to more sophisticated designs,” added Hardesty.
File system vs. operating system
Since operating systems and file systems crash, to avoid confusion, let’s delineate between the two. Operating systems control hardware and software resources; file systems identify, store, and retrieve data.
The MIT research team focused on crashes affecting the file system. Nickolai Zeldovich, one of the three making the October presentation and an associate professor of computer science and engineering at MIT told Hardesty, “What many people worry about is building these file systems to be reliable, both when they’re operating normally but also in the case of crashes, power failures, software bugs, hardware errors, and what have you.”
It’s a challenging problem
Hardesty mentioned that figuring out how to recover from a file-system crash is complicated. There are so many different places the code could fail. “You have to consider every instruction or every disk operation and think what if I crash now…,” explained Zeldovich. “So empirically, people have found lots of bugs in file systems that have to do with crash recovery, and they keep finding them, even in very well tested file systems, because it’s just so hard to do.”
That said, the researchers feel they have a solution. Zeldovich along with Frans Kaashoek, professor of electrical engineering and computer science (EECS), Adam Chlipala, associate professor of computer science, Haogang Chen, a graduate student in EECS, and Daniel Ziegler, an undergraduate in EECS, used formal verification to prove their file system does not lose track of data during a crash.
Simply put, formal verification is the process of checking whether a design satisfies specified requirements (properties), and, in this case that means not losing data. “It’s a complicated process, so it’s generally applied only to very high-level schematic representations of a program’s functionality,” wrote Hardesty. “Translating high-level schema into working code, however, can introduce myriad complications that the proofs don’t address.”
To avoid those pitfalls, the MIT researchers ran a formal verification on the file system’s final code, which is much more difficult. “This formal proving environment includes a programming language,” Chlipala explained. “So we implement the file system in the same language where we’re writing the proofs. And the proofs are checked against the actual file system, not some whiteboard idealization that has no formal connection to the code.”
Never been done before
Because of their stipulations, the team needed to input much more data than typically required — for example, cataloging all the components of the file system, and then describing all component relationships that occur during a file-system crash. “In the course of writing the file system, they (researchers) repeatedly went back and retooled the system specifications,” said Hardesty. “In the course of writing the file system, they (researchers) repeatedly went back and retooled the system specifications,” said Hardesty. “But even though they rewrote the file system ‘probably 10 times,’ Zeldovich says, Kaashoek estimates they spent 90 percent of their time on the definitions of the system components and the relationships between them.”
Researcher Kaashoek summed it up by saying, “No one had done it. It’s not like you could look up a paper that says, ‘This is the way to do it.’ But now you can read our paper and presumably do it a lot faster.”
Their effort paid off
Diving deeper into what the MIT team accomplished would be above my pay grade. However, those who understand what this means have taken notice. Hardesty talked to Ulfar Erlingsson, lead manager of security research at Google. “It’s not like people haven’t proven things in the past,” noted Erlingsson. “But usually the methods and technologies, the formalisms that were developed for creating the proofs, were so esoteric and so specific to the problem that there’s hardly any chance there would be repeat work built on it.”
Erlingsson added, “But I can say for certain… this is stuff that’s going to get built on and applied in many different domains. That’s what’s so exciting.”
Note: TechRepublic and Tech Pro Research are CBS Interactive properties.