Disaster Recovery

Backups: Network planner's gotcha

Mark Underwood takes a look at the current state of backup technologies and how backup tasks affect capacity planning for network admins who face ever-increasing demands on network resources.

It's the era of HD video, streaming audio, and all that megapixel hype. You can't fight it. In fact, chances are good that you're part of the problem, though perhaps unintentionally. Whom does the network planner have to thank for this circumstance? Thank PowerPoint deck designers, Photoshop-savvy photographers, data warehouse warriors, the virtual machine vanguard, Autotuning artists, Web 2.0 marketing mavens, and fervent film fans. The appetite of these specialists, not to mention the steady move toward digital TV, ensures ongoing upgrades of network switches for years. But switch upgrades aren't the only network planning chore to be considered. Unglamorous, uninteresting, often-overlooked backup tasks could place significant demands on network resources, or even place an enterprise at risk.

A short history of underwhelming progress in backup computing

For those who perform capacity planning, nominal network usage means gauging network speeds, types, and volume of transfer under typical usage profiles. Such profiles interpret normal usage to mean service levels expected by applications availability when the users need them - such as during prime shifts, with at least some attention also paid to peak demand capacity.

The backup problem and its complement, data and service restoration, have been with computing from the beginning. When machines were less reliable, it was an ever-present concern; work was performed in small chunks with frequent restarts. Today greater reliability is taken for granted, but the stakes are higher; systems are increasingly interdependent, larger, process more voluminous information, and touch networks in complex ways that challenge the simple backup schemes trusted by many smaller organizations. Not only that, but we haven't made it any easier for users. In fact, we've removed features that were once in place to allow users to manage at least some of their backup and restore needs.

Versioning file systems

Those old enough to remember DEC's VMS, now called OpenVMS, experienced file level versioning. It was very convenient for users, who had control over when to purge versions and when to bring them back into service. It had a user-specifiable granularity that backup schemes usually neglect to implement. Neither Linux nor Windows support it in 2010, so a spate of add-on hardware and software solutions have been added to the thirty-five year-old backup solutions most SMB's employ today.

Several "fixes" have been proposed as backup techniques evolved. For well-financed enterprises prepared to absorb the overhead required for disaster recovery (DR) and backup, perhaps newer technology has proven satisfactory, though perhaps not convenient. For home and desktop users, the current silver bullet seems to be the ubiquitous USB drive. For modestly more complex topologies, network-enabled disk-to-disk schemes seem to have carried the day.

Neither are elegant or friendly to users and system administrators.

Windows backup helper technologies

Volume Shadow Service (VSS)

The Windows VSS service operates at the block level. Because of this, VSS allows for a read-only copy of a snapshot to be created, and thus avoids file locking. That is essential to creating a backup that is at least consistent with itself and that permits users to keep working while backups are running. (Restores are another matter). Better backup software makes use of VSS (watch the Windows event log for possible side effects).

Bright Idea Department: Windows Home Server

While no reader of the column may be willing to admit it, the Windows Home Server offers a flexible backup solution at a very low price point. Though it won't work for most enterprise networks, it illustrates what a clever design can make possible. In fact, Home Server provides several hints for how a backup scheme should operate. It can grow flexibly, can back up onto commodity drives of different sizes and can deal with bare metal restores. (I have some reason to believe that it may not work as well in mixed Virtual Machine - non-VM environments).

System state

Windows provides a means for recording the current state of the operating system. Periodic system state saves should be part of regular backups, though coordinating these with other bare metal restore tools can prove nontrivial. Also, snapshots of system state can sometimes slow down or even temporarily freeze up some tasks while the snapshots are being taken.

Apple Time Machine

The Time Machine offers smaller environments a convenient and user-friendly way to access previous or deleted versions of files. The Time Machine was a vast improvement over previous Apple offerings, though its use in larger enterprise settings is difficult to assess.

(Separate) SAN + Box

If the budget is available to support one, consider using a SAN, such as Compellant or Dell EqualLogic. Then create a separate network subsystem to offload the backup resources. This way, the main network resources used to support nominal traffic remains unaffected by backup and restore operations. Such processing can become somewhat involved. For example, consider IBM Tivoli Storage Manager recommendations for one such method.  Large enterprises can also consider solutions such as Online Data Vault, InMage, and others.

Backup detritus

Backup and restore workflow can be easily tripped up by minor obstacles, such as VM images, dumps, logs, database files, Windows update work area. Take just one of these - dumps. A dump of system memory on a machine with 8GB of RAM is likely to be bigger than a dump on a Windows 2000 workstation with 512MB.

Want to get serious about capacity planning?

Cisco is betting the farm on continued explosion of Internet speeds and volume. The company's latest guess is for 767 exabytes by 2014, driven by, Cisco believes, video demand. Add VOIP and collaboration technologies to the mix and a network planner may be left rummaging through his toolbox for a better way.

One method worth mentioning, though it requires more cost and effort than some may find worthwhile, is simulation. Firms such as Opnet or Scalable offer ways to describe a network topology, the systems and services that need to be supported, and then to simulate network performance under various scenarios. See this Opnet-based student exercise to get a flavor of such an undertaking. Figure A is a screenshot from that exercise (click to enlarge).

Seven favorite use cases

1.       Files deleted "a while ago"

Apple's Time Machine saves hourly backups for 24 hours, daily backups for a month, and weekly backups for everything older than a month.

2.       RAID restore

The bigger the physical drive, the longer it will take to rebuild the RAID from backup.

3.       Restore events spanning work shifts

Backups can usually operate unmonitored, but restores may not have the luxury. Those knowledgeable about applications being restored may be needed to help put restored data back into service, and in global or 24/7 operations, they may not be on a convenient work shift.

4.       Oblivious applications

There's a lot of talk about smarter apps, but many are still oblivious to backup and restore processing. Some require taking down entire user communities, and others try to pass the buck to the database admin.

5.       How "big" is a "big file"

Outlook's quaint classification of file sizes is a legacy due in part to the PST file system it sometimes uses, but when it refers to files "> 5MB" as "Enormous" it hints at how applications lag behind user needs to process larger files. The problem of "big files" can cascade rapidly through an organization. User disks become full. Network traffic increases as files are copied across network shares to local machines, sent via email, or combined with other files for aggregation.

6.       Cloud backups ("Where's the phone # for my ISP?")

As users of some consumer online backup services such as Carbonite have learned, one needs serious upload speeds to make offsite backup and restore feasible. Prepare to pay more to push backup packets to the cloud.

7.       Virtual machine backups

Virtual machine backup, restore, and propagation have created a new class of requirements.

Extra credit: Major news event

You're British Petroleum. Everyone in the world wants to tap your undersea camera's video stream. Bet you didn't plan for that.

About

Mark Underwood ("knowlengr") works for a small, agile R&D firm. He thinly spreads interests (network manageability, AI, BI, psychoacoustics, poetry, cognition, software quality, literary fiction, transparency) and activations (www.knowlengr.com) from...

10 comments
aparagarwal
aparagarwal

Hi Guys, It is really nice reading the blog and comments. One point which is unclear to me is how much data is taken into account while differentiating between small and big organization. I am a Backup Administrator and responsible to backup some 30TB of data every night spread over several DC's and includes Wiindows File servers, DB servers, Mail servers etc. Environment is a mix of Unix and Windows platform. I never face difficulty to buildup a server using a backup versions and speed is really fast. Backup runs at a speed to 200 Gigs an hour over the LAN dedicated for backups. Another scenario which we use is the SAN backup through which we achieve the speed of around 350 Gigs an hour which is quite decent I suppose. File count also doesn't matter a lot but in case of Windows, I should admitt that it is slower but no failures. The more the files, backup will take much time to complete so as to read them and catalog all the data records in the DB. I would also say that files will huge sizes say 10 Gigs each take less time to be backup in comparison to 10 Gigs of data with say 10000 files. With latest technologies in the market like VSS (already explained in the blog), online replications, backup to disk and de-duplication etc makes a life for Backup Administrator easier especially for the data retention and faster recovery. To my thought, if the volumes for data is really high the organization must have the Storage infrastructure implemented and so must make use of it to perform the SAN Backups rather then using LAN backups. Any comments and advices for the scope of improvement would be appreciated. Thanks.

Willie11
Willie11

I've been backing up my personal computer to a Seagate Free Agent Pro using Acronis for some time. I do not recommend this solution. Acronis works fine but I am on my third Free Agent drive and this one is now failing. I boot on the Acronis Cd and then backup the entire computer (either full or incremental). After that I always try to verify the backup then disconnect the drive and put it on the shelf till the next time. I am hoping some other external drive works better than this Seagate. I haven't had good luck with it at all. Any suggestions.

mikes
mikes

"Neither Linux nor Windows support it in 2010." Novell Storage Services (NSS) has allowed automatic versioning backups on Windows clients for a very long time. It is still an integral part of their Open Enterprise Server. As long as unused hard drive space is available, it will keep backup versions indefinitely. When the salvage space needs to be used for production files, it automatically removes the oldest backups to make room. While it's not exactly plug and play, once it's running, it's a true stressbuster. I have no idea why Novell doesn't plug this diamond more in their marketing.

gary.hewett
gary.hewett

You indicate that the *awareness* of software is lagging - I would like to point out that this applies to to the cloud or Internet based solutions as well. I spent years trying many (albeit not quiet all) Internet based solutions (IMHO BTW Internet based does not equal cloud even though cloud implies Internet based) and the quality of software can be a huge issue. Case in point, I brought one service to it's knees for MONTHS (it still has never fully recovered my account) simply due to the NUMBER of files needed to backup - that service handles large files without issue - it chokes on large numbers of files however especially when compounded by large numbers of changes. My suggestion if you go the Internet route (and yes it is feasible - some vendors will even go that extra step of seeding your backup to bypass the initial load) ask a lot of questions, run a lot of tests - including full restores of critical APPLICATIONS and not just raw data.

bradcom
bradcom

1909 (as the Anglo-Persian Oil Company) 1954 (as the British Petroleum Company) 1998 (merger of British Petroleum and Amoco) 2001 Renamed. Come one, get up to date!

m.finlay
m.finlay

"...then disconnect the drive and put it on the shelf..." Not the recommendation that you were after but I would make sure that shelf is not in the same building in case of fire, flood, theft, etc. I'd recommend storing the backup in another location all-together. Get a 2nd drive so you always have one offsite.

knowlengr
knowlengr

Point taken, and I was unaware of that OES feature. (I was writing about versioning of the main file system, a la VMS, of course, not backup versioning).

knowlengr
knowlengr

Good point. In reviewing the backup logs for one service, I could see how cascading timeouts for smaller files could create problems. Perhaps because of metadata updating -- unclear.

CharlieSpencer
CharlieSpencer

The name has officially been 'KFC' for over a decade, but everybody knows who I'm talking about if I say 'Kentucky Fried Chicken'. A difference that makes no difference is no difference, except to the Public Relations department.