How do I... scan a hard drive for sensitive data with Spider?

Large capacity hard drives can be difficult to scan without the right tool and one of the better tools for scanning hard drives is Spider from Cornell University. Jack Wallen shows you how to use Spider 3.

There are many reasons why you would want to do a thorough scan on a PC for specific data. You could be recycling computers, bringing in new employees (to take over previous employees' machines), or simply removing sensitive information from a permanently networked machine. Regardless of your reason, a 120GB hard drive is a large drive to manually search for strings of data. But with the help of Cornell University's Spider tool, this task becomes quite a bit easier.

Spider works by scanning archive, normal, compressed, and temporary files (so long as the file isn't locked for use or encrypted) for data types such as U.S. Social Security numbers, Canadian Social Security numbers, credit card numbers, U.K. National Health Insurance numbers, and any data type for which the user supplies a regular expression. Spider can be run in two different ways: GUI and command line. And best of all, Spider is open source and crossplatform (Windows, OS X, UNIX.)

This blog post is also available in PDF format in a TechRepublic download.

Getting and installing

You first need to download the correct binary package (which includes the source) from the download Cornell University security tools page. For Windows you will be downloading a compressed .zip archive. Uncompress that file, and you will have a new directory called "Spider_release." Inside this folder is a README, a installation binary, and a directory containing the source code. Double-click on the installer package to install Spider 3.

The installation is a no-brainer. Just let it do its thing, and you will wind up with a new entry in your Start menu. This entry, Spider 3, contains three subentries:

  • RegexLibraryBuilder.exe
  • spider_3.0.exe, and
  • SpiderRegConvert.exe.

Starting Spider 3

From the Spider 3 menu, click the spider_3.0.exe entry to fire up Spider 3. The first window you will see is the main window (there is no initial configuration). Figure A shows the main window ready for a scan.

Figure A

Not much to it on the outside. It's what's on the inside that counts.

If you click Run Spider, you are going to initiate a default scan that will scan drive and network shares for strings matching: 15-string credit card numbers and U.S. social security numbers. This scan will create a log on your local drive (it is critical that this file be deleted when you are finished examining Spiders' findings).

So click Run Spider. The window will only change by showing what file the application is scanning (see Figure B).

Figure B

If Spider is taking a long time on a particular file, you can skip that file by hitting the Esc key.

During the scan you will probably notice when Spider locates any multimedia files because it will slow down. This is only because of the size of the file. As stated above you can skip this file by hitting the Esc key. If you have a lot of these, this process can be a pain. Fortunately Spider 3 has a way around this.

Configuring Spider 3

From the main window, click on the Configure menu and select the only entry: Settings. From this window (Figure C) you can take care of every possible Spider configuration you could hope for.

Figure C

Any time you feel you have monkeyed with the options beyond recognition you can reset to default.
Say you do not want Spider 3 spending too much time with your music collection (and any file associated with said collection). To avoid this, you will want to go to the File Extension Management tool. To get there, click on the Scan Options tab and then click the File Extension Management button (see Figure D).

Figure D

As you can see the default skip list is fairly lengthy.
By default most media extensions are already included in the skip list. But say you have another type of file (or even an in-house file type) that you want to skip. To add a new extension to skip is simple. Click on the Add button under File Extensions to Skip, which will open up a new window (Figure E).

Figure E

Once you have added the new extension, click OK and the window will close.

Naturally, depending on the size of the drive and the amount of files on the drive, the scan can take quite some time. But once the scan is done, the log viewer will open to show you the complete results of the scan.

Viewing the results

Once the scan is complete, the Spider 3 log viewer will automatically open. This log viewer is a very helpful tool in that it gives you instant information on each file and what hit type Spider 3 has found. Take a look at Figure F. You will see a number of files that drew flags from Spider 3.

Figure F

I actually had more hits than I thought I would.

When you highlight a suspected file, below the file listing you will see all the information you will need to have. In the example above you can see that the file klein.pdf is flagged with a credit card number. I happen to know this is a false positive, so I can ignore that file. However there were file listings (not shown) that did have bank account information. Those files had been backed up, and their location was mostly obfuscated. So I most likely would have completely forgotten of their existence. Thanks to Spider 3 I can delete them.

Taking action

To take action on a file (which basically means to delete the file), you do not have to open up Explorer and navigate to said file. Instead you can simply highlight the file within the log viewer and click the Erase or Delete File button.

Now the Run button is interesting. Say the file flagged has an associated application (for example Adobe Reader for PDF files). If you have a PDF file highlighted, clicking the Run button will open that highlighted file in Adobe Reader. This is a quick way to view the file to make sure Spider hasn't hit a false positive.

Final thoughts

Without applications like Spider 3 many people would be exchanging PC hard drives with very sensitive data on them. But thankfully applications like this do exist and they are simple to use. I would highly recommend Spider 3 to any IT admin (or even home user) who wants to make sure sensitive data is not found on their hard drives.


Jack Wallen is an award-winning writer for TechRepublic and He’s an avid promoter of open source and the voice of The Android Expert. For more news about Jack Wallen, visit his website


As part of PCI compliance I was required to scan our systems for card data. We tried Spider and Senf and found both did a basic job however the reports were full of incorrect matches and took a long time to read through. We ended up using a different app called Card Recon which was more appropriate for PCI Compliance given it's higher accuracy. If you need to comply with PCI, i'd suggest looking at it - download from If your just looking at something for personal use, you can't complain about spider's price.


Some of us are sufficiently paranoid that, given the availability of inexpensive high capacity drives, simply physically destroy the old, usually smaller, drives. You end up with no security problem and some really nifty magnets for your workshop or refrigerator.

Data Ninja
Data Ninja

Although this is an excellent tool, it's missing one aspect of sending or giving a hard drive to someone - the blank or empty areas of the media. It needs to be capable of scanning the empty/deleted areas of the media so that if any lingering information is out there it can be safely wiped, otherwise it can still be found. Anytime you give away or dispose of any media it's important to either use the manufacturer's tools to write zeroes to it or a wipe program that complies with the DOD standards for erasing media.

Mark W. Kaelin
Mark W. Kaelin

Spider is one of many tools you can use to scan hard drives for specific data. What other tools do you use? Does your organization recycle old hard drives or give them to charities? Or does your organization destroy those drives to protect data integrity?


There is an application that overwrites empty/deleted areas and works on multiple systems and HDDs simultaneously.


I've worked for numerous companies that degaussed their drives for drive integrity. Just depends on the nature of the business. I've worked at other places where just a simple ghost image - reload OEM OS was applied. I've also worked at a company who donated old computers to schools. Reloading the drives via imaging was the method of choice there as well.

Merlin the Wiz
Merlin the Wiz

I do not search for sensitive data. If I suspect there is sensitive data on a hdd I use the UCSD CMRR Hdderase utility from here: I also use it on every drive I remove or install. You MUST be sure you want to wipe the hdd because it uses tell me twice logic and a possible reboot (if the pc BIOS is locked to restrict hdd access) before it starst the erasure process. It has Never failed unless the hdd was password locked and I could not get the password. After a hdd is erased with this utility, Windows XP Professional will see the drive, but may not be able to partition it or format it. MS-DOS 7.0 will partition it as will most versions of Linux. Then depending on the size of the drive It may be necessary to change the partition size for a specific use.


we use the licensed version ok active killdisk using the dod method of multi-passes before shipping the drive back on lease return or reissuing the pc to another user.

Lost Cause?
Lost Cause?

I work for a School District in the great Southwest in New Mexico. Here, we use a DoD compatible software to erase Hard Drives before the computers are sold at auction.


Merlin, Great link. I've always used darik's boot and nuke free utility to wipe drives. It's nice to have more than one in my arsenal. Here's that link for those of you who are not aware. Additionally, I agree. Spider 3 is a great tool for forensics investigations, but the fact it leaves a log file behind and must be deleted all reverts to the fact that when you delete the log file it's not really deleted. You still need to run the HDD erase utility to really make the log file go away. Great stuff.

Editor's Picks