The City of Ryde's disaster-recovery tests showed that comprehensive documentation is essential, covering everything from software licence keys and hardware DIP switch settings to staff contact lists and the contents of backup tapes.
The City of Ryde, 12 kilometres north-west of the Sydney CBD, opted for an outsourced cold disaster-recovery site with a 4Mbps internet connection and desks for 50 of their 450 users. The package costs them around AU$10,000 per month, including one test per year.
The first test in June was largely a success, the city's business solutions manager Rodney Redwin told the Flexibility 2012 local government IT conference in Coffs Harbour last week. All of their business applications were running within 48 hours — except Exchange.
"We knew Exchange was a clustered environment. We were very naive [about] how complex the cluster was," Redwin told TechRepublic.
Virtualisation made the rapid recovery possible. Only the city's nine physical host servers needed to be rebuilt from scratch. The six or seven virtual servers running on each host could then simply be restored from backup.
Those backups live on a set of three consumer-level NAS boxes, rotated weekly. Each box has four 2TB hard drives running RAID5 for a total of around 5.5TB of usable storage. Databases are backed up separately to tape, along with other user data.
Redwin's tips include:
Have enough installation media, and test them all. Some burnt DVDs can't be read on servers. "It just didn't dawn on us, until I'm standing there putting a DVD from one server to the next server to the next server, that we've only got one media," he said
Make sure you've got the right versions of installation media to match the software versions you're now running
Record which updates and hotfixes you'll need to apply, and have copies of those files. "In the second test, we didn't actually patch Backup Exec, and we had a couple of minor hiccups with restores that didn't necessarily go properly," Redwin said
Record all your software licence keys and the names they were registered under, along with log-in details for all vendors' licence portals. "Adobe's licence portal is awful, truly awful," Redwin said. He now has seven different accounts there, named variously "City of Ryde", "Ryde", Ryde Council", and even individuals whose names appeared on purchase orders
Record all of your passwords for things like databases, backup users, and server administrator passwords. Redwin currently stores them in a password-protected spreadsheet, something that he admits isn't ideal. "There are better solutions to manage passwords out there ... but ultimately, you need a centralised tool that everyone on the IT department can access," he said
Label backup media in a human-readable form, not just bar codes. "No one ever bothers to write on the tape what jobs it did, and one tape can do six or seven jobs," Redwin said. "I think we've got over 200 tapes in our safe now. So you think, 'Oh, s***, which tape do I bring back to my disaster-recovery centre to actually do my restore?'"
Record your hardware configuration, right down to the level of DIP switch settings
Record your WAN configuration
You'll need staff contact lists to coordinate the recovery
Store copies of all of this documentation separately from your production systems.
The key message from the City of Ryde's tests is that vast amounts of institutional knowledge reside in people's heads.
"Oh God, yes! And we actually had one of our key members absent one of the days of our DR test, and all of a sudden that was noticeable. So not only do you have to disaster-recover your systems, you have to disaster-recover your knowledge, as well," Redwin said.