Software

Data spring cleaning tips for SMBs

Andy Moon shares his seven-step process for getting rid of unnecessary files in bulk without the danger of losing any data. Tell us about your data spring cleaning best practices.

The Oracle Application Users Group released the results of a study last week that revealed 87% of the respondents blame data growth for their performance issues. As I opined recently, I think it could be very good for IT if users culled data that they don't need in order to reduce stress on storage and backup infrastructures. A recent PC World article suggests that users who don't do such cleaning regularly may be costing their companies a lot of money.

On the infrastructure side of things, there are a lot of technologies aimed at trying to help organizations spend less on storage. Deduplication technologies remove redundant information on storage devices so that things like operating system files or presentations that exist on many users' home drives will only take up space once. Multi-tiered storage allows the most critical data to be stored on high-speed, expensive hardware while less crucial or less frequently used data resides on slower, cheaper hardware.

Unfortunately, the IT-centered solutions leave us with the same problem: Data growth is explosive and nearly unchecked in all industries. Granted, there are many good reasons to keep a lot of this data, including regulatory requirements, files that need to be quickly accessible, and files that are accessed frequently. However there are many files, particularly on users' desktop PCs, that are simply irrelevant, old garbage that should be treated as such. In the business world, there is too much work to do for us to spend the necessary time cleaning our data.

In order to be a help, I am posting my strategy for getting rid of unnecessary files in bulk without the danger of losing any data. This is what I educate my users to do, and we have been able to forestall increasing server space simply through reducing file volume on our servers.

  1. Get a CD or DVD burner for your PC.
  2. Burn all of your data files to the burner you acquired.
  3. Go through the media you just burned to make sure the burn was successful.
  4. Delete everything you just burned from your PC.
  5. Browse through the media you just burned and copy back to your PC only the files you absolutely know you will need in the next day or two.
  6. Keep the media in your drive for a few weeks and, when you need a file that is on the media, copy it back to your PC.
  7. Label the disc with the date (I do this yearly, so I label mine "Clean - 2009").

For Outlook, the process is a little different. I keep a year of email in a PST on my desktop. Every year, I take the previous year's PST, archive it to CD or DVD, and close the PST in Outlook. So, at the end of 2009, I archived all of my 2009 email, created a new PST for 2010, burned my 2008 email, and then deleted the 2008 PST.

Using this strategy, I still have access to my old email and files if I need them, but they aren't taking up space on my hard drive, a network drive, or any backup medium. As a result, my inbox is a little over 13 MB, last year's PST is just under 500 MB, and My Documents is under 50 MB. I am also secure in the knowledge that if someone shows up needing an email or file from two years ago (when I started my current job), I have access to it.

What data spring cleaning best practices do you recommend? Do you think expecting users to clean up their own data is realistic? Share your tips and your thoughts in the discussion.

TechRepublic's Servers and Storage newsletter, delivered on Monday and Wednesday, offers tips that will help you manage and optimize your data center. Automatically sign up today!

17 comments
plangite
plangite

risky business! Don't forget about Murphy's Law. Maybe you can say I am a litlle paranoid, but why not make a copy of the DVD? This media is a little fragile, and surely I'm not happy thinking that a lot of work is in there. Just two cents!!...

blackepyon01
blackepyon01

I like how Windows Deployment Services utilizes this technology. I may make several images of a system before I'm redy to deploy it. One for a base install of OS and drivers, next with office and windows updates, etc. At several gigs an image, that adds up quick on the server, which is redundantly backed up on a rotation of removable drives. We used to use Ghost for imaging. With WDM, it only keeps one copy of the common files in an image group, and makes seperate image files with just what changes between them. Saves a lot of space. With Ghost, there would be multiple 2GB files for each image, which adds up very fast.

Excelmann
Excelmann

Just like your regular data backup. Dont rely on it until you have performed the acid test. We use vaulting but have seen helpdesk people misconfigure it on PCs and later what was thought to exist could not be recreated.

The 'G-Man.'
The 'G-Man.'

come on- you started talking industry level and ended up with a mom & pop solution. A CD or DVD burner will not cope with the levels of data in a place that use or need Deduplication technologies.

avinashjha84
avinashjha84

Its really a very good thing to save your data without affecting your hard drive

netter007
netter007

woow, that's great tips. Thanks this can bring knowledge for me and everyone read it

CrypticDancer
CrypticDancer

Burning a CD or DVD or using USB stick may well work in a small department, but in a goverment org and I'd suggest is the same for a commercial organisation you need enterprise software, anything else is suicide. We run Enterprise Vault for e-mail and Commvault for backups, we are about to use Commvaults file archiving and deduplication, I wish we'd used the Commvault e-mail archive also. We force 'My Documents' to the users home drive, enourage 'make available offline' for encrypted laptops and also restrict USB access. USB sticks should be considerd as a temporary medium. If data is leaving on any media we need to have an audit trail. We are also looking at restricting the creation of PST files. I'd like to enforce quotas but can't get management buy in yet, we have a security team that regularily scans for material that breaches policy, which keeps the photo albums off my storage. I guess you pitch your strategy to your scale, I have 12tb on 6 fileservers, not to mention the application servers. I agree with Discovery, getting hit by a Freedom of Information request or other legal search means you need control. I am not sure what angle the author is taking with suggesting CD or DVD as an alternative to IT-centered solutions, leaving users decide what sits on a server and what they back up. Also data on users desktop mean support would have to pussyfoot around users, copying data back and forth if forced to re-image a desktop. I would not like to expose my career by being dependant on a DVD or CD. More suitable for home use, I suggest.

Discovery
Discovery

I hope that your organization is not subject to litigation or regulatory compliance audits. Your proposal for creating CD's/DVD's or other non-organizationally managed storage of information may create nightmares of epic proportion when and if your organization is involved in litigation or a regulatory audit that impacts your role in the organization. If nothing else, the discovery costs will go through the roof when it is discovered that there are uncontrolled file archives ?out in the wild?. The discovery costs could easily go from hundreds of thousands of dollars to millions of dollars. I have seen it happen. Secondly, you may be in violation of your organization's record retention and destruction policy which opens the organization up to allegations of spoliation and possible summary judgments or adverse inferences. This could cost your organization millions of dollars ? it has happened. In addition, the information you retain may have the "smoking gun" that opens your organization up to a severe regulatory settlement or the loss of a legal case. This could cost your organization millions of dollars. The statistics are quite high regarding litigation that was lost because information exists that should have been destroyed. In summary, you may be assuming the accountability for a severe outcome that would not have otherwise occurred. A better approach is for your organization to have record retention and destruction policies with the archiving policies and technology to enforce it; while giving the access you need to the information essential for you to perform your job. More and more organizations are preventing individuals from saving information outside of the organization?s formalized program (through technology and/or policies) because the litigation and regulatory risks and consequences are simply too high.

Andy J. Moon
Andy J. Moon

After writing this post, I went through some of my old CD archives just to see what was in there. I saw old blog posts, reports for systems that we don't even have in production any more, paperwork for a network that doesn't even exist anymore, basically just heaps of files that would still be on my servers if I wasn't so anal about cleaning out my data. How long has it been since your data got a good spring cleaning?

Andy J. Moon
Andy J. Moon

...and my current concern is exactly that, a small department. I would love to be able to enforce the kinds of policies you are talking about, but in education, that doesn't fly, at least not yet. You are absolutely correct about the concern that the enterprise has, this is the reason we are trying to implement a system that gives us more control over retention of documents, the same control we have over retention of email. As we move towards this system, I am trying to get the users to think of their desktops as commodities so that when there is a problem, we can just push a new image and their data is available to them. The resistance I am getting is in personal material and I don't have and am not likely to get security teams to keep that off our machines. There is so much red tape in the way of such a move in my industry that I will be surprised, but pleasantly so, when it happens.

RitaGurevich
RitaGurevich

I agree, there are organizations that must maintain data, and make sure its easily accessible, for litigation purposes. But, there are typically time conditions attached to those regulations. For example, some industries require data to be kept for 7 years. But, what about data older than 7 years? Perhaps the need to keep that information is at the discretion of the owner. I actually work for a professional services organization that specializes in data clean-up. A sweet spot for us is in the Exchange public folder space. Contents of public folders, for many organizations, do not need to be kept for a certain # of years, and we have a system for surveying the end users, or asking them to certify the data via a self-help web portal. This allows us to delete data that falls outside of regulatory requirements, based on a user-approved workflow. We've cleaned up terabytes of data, saving companies in both storage and management costs. Some of our customers do want us to take an extra precautionary step to archive to really cheap storage first, that may not be easily accessible, but if absolutely needed, can be retrieved. If anybody is interested in learning more about how we do this for mailboxes, public folders, file systems, etc, send me an email to rgurevich@12sphere.com.

Andy J. Moon
Andy J. Moon

Thankfully, the data in my part of the organization is no longer subject to the most onerous regulations, but you bring up some fantastic points. One of my ulterior motives in having users back up in this manner is to make it easy for them to destroy once they are convinced they don't need the disks. The data that I am concerned about is garbage and needs to be purged for the reasons you stated in addition to mine. I wish I were in an organization where we could expire data and I dream of being in such an environment. We are currently implementing a document management platform that has the ability to do everything you outlined, but there are a lot of users in education that want to hold on to their data forever. They let go a lot more quickly when you can demonstrate that they don't need the data in the first place.

njcsamuels
njcsamuels

I would think tapes last longer than optical

oldbaritone
oldbaritone

I don't do much routine cleaning-for-the-sake-of-cleaning, because I've been burned too many times. It seemed that every time I moved something off-line, the customer would call up and ask something about it within a couple of weeks. So I have created a solid tree structure to organize where data is kept, (Forget about "My Documents") and the tree is organized so that within a few clicks I have isolated information into blocks of a couple dozen files each, at most. With a little work, anything can be found quickly. like Customer/custname/Current/Correspondence Customer/custmane/Archive/Projects It's amazing how many "new" projects tickle a memory - gee, I remember something like that at ___, and a quick glance at old work reminds me of a framework for a solution to the current problem - and I have the added bonus of reviewing the pitfalls we found along the way. It's amazing how many more deals you close when you say "Yes, I've done something like that, and you know, we had a challenge with ___. To avoid a pitfall like that, you might want to consider ___" When "big" HDD's were 10 Megabytes, a few unneeded 200K files made a big difference. Now, with Terabyte drives at $150.00, I don't worry about it. I used to maintain Tape or CD archives, but these days it's easier to leave it on-line. Just grab an entire sub-directory, and move it to the Archive side when it becomes stale. I still do quarterly DVD backups for data protection, and just use USB thumb drives for ongoing backup of current work.

Selltekk
Selltekk

I work for a state government IT shop. We oversee the entire State's infrastructure. We have recently begun moving everyone to Enterprise Vault so as we migrate people in, we do away with PST files. Part of the issue with PST files is that as they get larger, the chance of encountering corruption increases. Working for a State agency, everyone seems to think that they need to keep everything for public record, and hand holding 5000 users who have a hard time even creating an excel spreadsheet through this type of cleaning task is futile. Perhaps in this situation we need to look into some type of data deduplication archiving solution.

CrypticDancer
CrypticDancer

It's a long fight. Basically you need to get all your team saying the same message. That's up and down. Tough if you haven't got management support. If users insist on keeping data locally dissolve all responsibility for it. There's a saying, a country's population gets the police force it deserves, I think equally, IT users get the IT support they deserve, its a two way thing.

Editor's Picks