How many times have you been working on a Microsoft Office file only to have a network hiccup or system crash wind up corrupting your file? I’ve seen it plenty of times, and when there is no backup, clients or end users start sweating and tossing about language NSFW.

A backup is the ideal solution, but when one isn’t available and a third-party tool such as File Repair fails, you need one last solution. If the files has been saved in .docx, .xlsx, or .pptx format, it is possible to recover the data (in some form). These three formats are archive files; a tool like 7-Zip can extract the archive’s contents so you can navigate through the files within to hopefully recover some form of data. It’s not perfect, but it works in certain situations such as:

  • The application does not recognize the file’s format;
  • The file cannot be read by the associated application;
  • The default application cannot open the file; or
  • Low system resource errors (or out of memory errors).

How to recover data using 7-Zip

  1. Install 7-Zip on your machine. Once it’s installed, 7-Zip will integrate into Explorer.
  2. Open Explorer and navigate to the location of the problem file.
  3. Right-click the file and select 7-Zip | Extract To NAME (NAME is the name of the file that will be converted to a folder).

When this completes, a new folder is created. The contents of that folder will depend upon the file type that was extracted. You should look for these folders and files:

  • word folder: contains document.xml (the document’s text) and media (lists the embedded media).
  • xl\worksheets: contains sheet[X].xml (the spreadsheet data of sheet X).
  • ppt\media: lists the media embedded in the PowerPoint presentation.
  • ppt\slides: contains the data found in each slide.

I intentionally corrupted a file by interrupting a file save. The file was a .docx file and neither Microsoft Word or LibreOffice Writer could open the file. Then I ran through the process outlined above and wound up with the folder 5Ebookreaders. Within that folder, word\document.xml was carefully tucked away. Microsoft Word could not open the file, but LibreOffice Writer could. Although the file’s contents were not perfect (Figure A), the document’s text was there and could be extracted.
Figure A

The original document was two pages; the recovered document was four pages. (Click the image to enlarge.)

After I copied and pasted the paragraphs out of the xml document and into a new .odt file, my data were recovered.

I tested this same method using a spreadsheet and didn’t have as good of luck. As you can see in Figure B, the spreadsheet’s contents were not nearly as useful, but some of the data were available.
Figure B

At least one column contained usable data. (Click the image to enlarge.)

The interesting thing about this test was that LibreOffice Calc was unable to open the .xml document, whereas Microsoft Excel could open it. If you compare the recovered spreadsheet to the original (Figure C), it is fairly clear that the data can easily be extracted.
Figure C

This is a tiny sample spreadsheet that was recovered. (Click the image to enlarge.)

Get back in business

This is not an exacting method, and you should not depend upon it. The first line of attack for data recovery is always backups. If backups do not work, you can try to restore to an earlier version of the containing folder. If that doesn’t work, you should try third-party software.

When your last-ditch efforts fail, I encourage you to give this free method a try, and see if there is data that can be recovered. It might not be ideal — and you might wind up having to recreate your document from pieces — but it could have you back in business.