As you may know, the electronic versions of documents created in Microsoft Office, more specifically Word, Excel, and PowerPoint, may contain hidden data that can inadvertently leak confidential or proprietary information to your company’s competitors or colleagues. Of course, I’m talking about metadata.
Designed to improve the document-creation process, especially when more than one person is involved, metadata can be an extremely useful feature. However, once the document-creation process is complete, metadata should be removed from the document since it's no longer needed. Unfortunately, Microsoft doesn’t provide a method for doing so.
As such, it’s all too easy to simply forget that this potentially harmful metadata is hidden inside the finished document as you pass it along to colleagues or competitors via e-mail or other electronic media. A crafty recipient of such a metadata-laden document can very easily discover all sorts of details that went on during the document-creation process. Some of these details could be related to confidential information or, at the very least, be embarrassing to your company if they fell into the wrong hands.
Fortunately, a number of companies have stepped up to solve this problem by creating special utilities designed to remove hidden metadata from Microsoft Office documents. One such company, Esquire Innovations, has created a unique product called iScrub that not only removes metadata from Word, Excel, and PowerPoint documents, but can also lock down a document so that the recipient can do little more than view and print it. Another unique feature in iScrub will allow you to perform a batch metadata removal operation on an entire folder of Microsoft Office documents.
A look at potentially harmful metadata
To begin, it’s important to understand that what can be considered the most potentially harmful metadata isn’t automatically added to Office documents by default. It’s actually added to documents when the user enables certain features that are designed to save time or to enhance the document-creation process. Since Word is one of the biggest culprits when it comes to useful features that can add metadata to documents, let’s use it as an example.
When you create documents in Word, enabling the Fast Save feature can be considered a time-saver because it's designed to speed up the File Save operation. To do so, the Fast Save feature records only incremental changes in a document. In the process, the changes actually append to the document rather than overwriting the existing text. Word then stores all these various appendages in the document as hidden metadata and pieces them together each time you open the document. If not removed, this type of metadata could reveal all sorts of confidential details that were entered into the document and later changed.
If you’re collaborating on a document, it’s common practice to use Word’s Track Changes feature. When you do, Word keeps track of any text that is deleted, added, or otherwise modified and stores that information in the document as metadata. Under normal circumstances, these changes are either accepted or rejected and the metadata is removed from the document. However, Track Changes can be disabled without having to first accept or reject the changes. This would allow someone to later reenable Track Changes and see any changes that were made to the document, which could include the intent to remove confidential details.
As you can see from just these two examples, Word’s metadata can easily allow a document to contain information that could be detrimental or embarrassing to your company if it were to fall into the wrong hands. Keep in mind that there are many other features that add metadata to Word and other Microsoft Office documents.
In a previous article on metadata, “Clean potentially harmful metadata from Office documents with ezClean,” I went into a bit more detail on some of the features that can add potentially harmful metadata to a Word document. Beyond those listed, keep in mind that lots of other features can add metadata to Microsoft Office documents. For example, metadata can exist in templates, styles, links, macros, routing slips, etc.
If you want a more comprehensive list of the Microsoft Office features that add metadata to documents, you should investigate the Microsoft Knowledge Base article 223396, “How to Minimize Metadata in Microsoft Office Documents.” This article provides more information on metadata, as well as links to other articles that detail all the settings you may have to manually alter in order to minimize or remove metadata from documents created with Excel, Word, and PowerPoint versions 97, 2000, and 2002.
After you install iScrub, you’ll find that an iScrub toolbar is added to Word, Excel, and PowerPoint. You’ll also find a set of iScrub commands added to the File menu. Again, since Word is one of the biggest culprits when it comes to potentially harmful metadata, I’ll use it as an example as I look at how iScrub works.
The iScrub toolbar for Word contains four buttons, shown in Figure A, which I’ve labeled for discussion purposes.
|iScrub adds a toolbar that allows you to quickly and easily clean metadata out of a document.|
A bit of background on iScrub’s legal origin
As you learn more about the iScrub product, keep in mind that it was originally developed for and is primarily marketed toward law firms. Therefore, a lot of the terminology in the documentation (and some in the program itself) refers specifically to the legal industry.
Publish As Scrubbed Document
When you click the Publish As Scrubbed Document button, you’ll see the dialog box shown in Figure B. As you can see, the default Scrub level setting is Cooperator. The other Scrub level setting is Adversary. As you can imagine, the Cooperator level is designed to remove less metadata than the Adversary level. The amount and types of metadata removed by each of these levels is specified in the metadata removal settings configuration file, which is stored in an XML format. You can use any XML editor or even Notepad to edit the file and enable and disable the various settings.
|When you click the Publish As Scrubbed Document button, you’ll see this dialog box.|
While iScrub is installed on individual workstations, each installation can be directed to retrieve metadata removal settings from a network-based configuration file. As such, administrators can establish centrally managed standards that are used throughout the enterprise to control how metadata is dealt with in all Microsoft Office documents.
When iScrub cleans the metadata out of a document, you can use the settings available on the Save As drop-down to configure how you want iScrub to treat the resulting document. You can choose to have it save the scrubbed document as a new Word document, a new RTF document, or you can have iScrub overwrite the original document.
When you choose the New Document setting in the Save As drop-down, in addition to cleaning the metadata out of a document, you have the option of locking the document by selecting the Apply MetaSealant check box. When you choose this option, the resulting document can be viewed and printed, but it cannot be edited in any way—you can’t even copy text.
MetaSealant is an awesome feature
When I was evaluating iScrub and began experimenting with the MetaSealant feature, I was very impressed. Using the MetaSealant feature on a Word document is analogous to creating a PDF file, except that the document remains in Word format. The only downside that I encountered was that, unlike PDF files, MetaSealed Word documents are not searchable. In other words, the document is locked down so tight that Word’s Find tool is disabled.
On the User Preferences tab, you’ll find a few additional settings that will allow you to further customize the cleaning procedure. These additional settings will vary depending on the selected Scrub level and the options specified in the configuration file.
Active Document Scrub
When you select the Active Document Scrub button, iScrub will automatically clean the metadata from a document based on the default settings in the configuration file without prompting the user. The Active scrub feature can also be configured to automatically occur at regular intervals, such as each time the document is saved.
At any point, you can click the View Metadata button and take a detailed look at all of the metadata contained in the document. The metadata is displayed in a multitabbed dialog box, like the one shown in Figure C.
|The View Metadata feature uses a multitabbed dialog box to break down the various pieces of metadata contained in a document.|
At the bottom of the View Metadata dialog box, you’ll see the Create Metadata Schema button. Clicking this button produces a very detailed report in a new Word document that includes all of the information displayed in the View Metadata dialog box.
Check out iDiscover
To help you evaluate the danger posed by metadata in your company, Esquire Innovations has made the View Metadata component of iScrub available as a stand-alone tool called iDiscover, which you can download and install for free. Once you install iDiscover, you can use it to quickly take a look at all the metadata hidden in your Word documents. For more details and the download, check out the iDiscover page on the Esquire Innovations Web site.
Set Global Options
The last button on Word’s iScrub toolbar allows an administrator to configure many of the features found in Word’s Options dialog box that enable metadata in a document. If you’re administrating iScrub’s network configuration file, you can use the options available via the Set Global Options tool to specify what settings you want to make available to users.
As you would imagine, iScrub can be integrated into Microsoft Outlook so that any Microsoft Office document that is attached to an e-mail message is examined once the Send button is clicked. At that time, iScrub will display an iScrub Attachment Alert dialog box like the one shown in Figure D.
|If iScrub detects a Microsoft Office document that is attached to an e-mail message, it will prompt you to clean the document before sending it.|
In addition to working with Microsoft Outlook, iScrub can also be integrated into Novell GroupWise 5.5 and above.
The SpinCycle tool
If you have multiple Microsoft Office documents from which you want to remove metadata, you don’t have to perform individual operations on each file. Instead, you can use the iScrub SpinCycle tool, which is available on the Start menu as an individual utility. The SpinCycle tool provides you with all the features you need to select groups of files and configure the level of metadata cleaning, as shown in Figure E.
|You can use the SpinCycle tool to easily remove metadata from multiple Microsoft Office documents.|
Getting a copy of iScrub
For pricing information and details on purchasing iScrub, contact Esquire Innovations’ Sales department at (909) 506-5641, ext. 203, or via e-mail at firstname.lastname@example.org. You can also contact Esquire Innovations by using the inquiry form on the Contact page. In addition, you can download a product brochure from the iScrub page.
iScrub vs. ezClean
As I mentioned, I previously took a look at another metadata removal tool called ezClean from KKL Software. When comparing iScrub and ezClean, I’d have to say that they both do an excellent job of removing metadata from Word, Excel, and PowerPoint. And they both have e-mail components that recognize Microsoft Office documents as attachments.
When it comes to choosing the level of metadata removal on an individual document, I prefer ezClean’s comprehensive dialog box over iScrub’s broad Cooperator or Adversary options. On the other hand, when it comes to managing metadata removal standards throughout the enterprise, I prefer iScrub’s network-based XML file approach over ezClean’s distributed INI file approach.
Now, when it comes to options that go above and beyond the call of duty, iScrub is clearly ahead of the game with its MetaSealant and SpinCycle features. As far as product availability goes, ezClean takes the lead. Being able to download its fully functional 45-day trial copy allows you to instantly begin using and evaluating the product.
Greg Shultz is a freelance Technical Writer. Previously, he has worked as Documentation Specialist in the software industry, a Technical Support Specialist in educational industry, and a Technical Journalist in the computer publishing industry.