How do I... scan a hard drive for sensitive data with Spider?

Large capacity hard drives can be difficult to scan without the right tool and one of the better tools for scanning hard drives is Spider from Cornell University. Jack Wallen shows you how to use Spider 3.

There are many reasons why you would want to do a thorough scan on a PC for specific data. You could be recycling computers, bringing in new employees (to take over previous employees' machines), or simply removing sensitive information from a permanently networked machine. Regardless of your reason, a 120GB hard drive is a large drive to manually search for strings of data. But with the help of Cornell University's Spider tool, this task becomes quite a bit easier.

Spider works by scanning archive, normal, compressed, and temporary files (so long as the file isn't locked for use or encrypted) for data types such as U.S. Social Security numbers, Canadian Social Security numbers, credit card numbers, U.K. National Health Insurance numbers, and any data type for which the user supplies a regular expression. Spider can be run in two different ways: GUI and command line. And best of all, Spider is open source and crossplatform (Windows, OS X, UNIX.)

This blog post is also available in PDF format in a TechRepublic download.

Getting and installing

You first need to download the correct binary package (which includes the source) from the download Cornell University security tools page. For Windows you will be downloading a compressed .zip archive. Uncompress that file, and you will have a new directory called "Spider_release." Inside this folder is a README, a installation binary, and a directory containing the source code. Double-click on the installer package to install Spider 3.

The installation is a no-brainer. Just let it do its thing, and you will wind up with a new entry in your Start menu. This entry, Spider 3, contains three subentries:

  • RegexLibraryBuilder.exe
  • spider_3.0.exe, and
  • SpiderRegConvert.exe.

Starting Spider 3

From the Spider 3 menu, click the spider_3.0.exe entry to fire up Spider 3. The first window you will see is the main window (there is no initial configuration). Figure A shows the main window ready for a scan.

Figure A

Not much to it on the outside. It's what's on the inside that counts.

If you click Run Spider, you are going to initiate a default scan that will scan drive and network shares for strings matching: 15-string credit card numbers and U.S. social security numbers. This scan will create a log on your local drive (it is critical that this file be deleted when you are finished examining Spiders' findings).

So click Run Spider. The window will only change by showing what file the application is scanning (see Figure B).

Figure B

If Spider is taking a long time on a particular file, you can skip that file by hitting the Esc key.

During the scan you will probably notice when Spider locates any multimedia files because it will slow down. This is only because of the size of the file. As stated above you can skip this file by hitting the Esc key. If you have a lot of these, this process can be a pain. Fortunately Spider 3 has a way around this.

Configuring Spider 3

From the main window, click on the Configure menu and select the only entry: Settings. From this window (Figure C) you can take care of every possible Spider configuration you could hope for.

Figure C

Any time you feel you have monkeyed with the options beyond recognition you can reset to default.
Say you do not want Spider 3 spending too much time with your music collection (and any file associated with said collection). To avoid this, you will want to go to the File Extension Management tool. To get there, click on the Scan Options tab and then click the File Extension Management button (see Figure D).

Figure D

As you can see the default skip list is fairly lengthy.
By default most media extensions are already included in the skip list. But say you have another type of file (or even an in-house file type) that you want to skip. To add a new extension to skip is simple. Click on the Add button under File Extensions to Skip, which will open up a new window (Figure E).

Figure E

Once you have added the new extension, click OK and the window will close.

Naturally, depending on the size of the drive and the amount of files on the drive, the scan can take quite some time. But once the scan is done, the log viewer will open to show you the complete results of the scan.

Viewing the results

Once the scan is complete, the Spider 3 log viewer will automatically open. This log viewer is a very helpful tool in that it gives you instant information on each file and what hit type Spider 3 has found. Take a look at Figure F. You will see a number of files that drew flags from Spider 3.

Figure F

I actually had more hits than I thought I would.

When you highlight a suspected file, below the file listing you will see all the information you will need to have. In the example above you can see that the file klein.pdf is flagged with a credit card number. I happen to know this is a false positive, so I can ignore that file. However there were file listings (not shown) that did have bank account information. Those files had been backed up, and their location was mostly obfuscated. So I most likely would have completely forgotten of their existence. Thanks to Spider 3 I can delete them.

Taking action

To take action on a file (which basically means to delete the file), you do not have to open up Explorer and navigate to said file. Instead you can simply highlight the file within the log viewer and click the Erase or Delete File button.

Now the Run button is interesting. Say the file flagged has an associated application (for example Adobe Reader for PDF files). If you have a PDF file highlighted, clicking the Run button will open that highlighted file in Adobe Reader. This is a quick way to view the file to make sure Spider hasn't hit a false positive.

Final thoughts

Without applications like Spider 3 many people would be exchanging PC hard drives with very sensitive data on them. But thankfully applications like this do exist and they are simple to use. I would highly recommend Spider 3 to any IT admin (or even home user) who wants to make sure sensitive data is not found on their hard drives.


Jack Wallen is an award-winning writer for TechRepublic and He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website

Editor's Picks