A startling investment
A couple of years ago I did some consulting work for an organization having chronic backup problems. This company had several dozen users and backed up 10TB of data including SQL databases, email servers, critical server operating systems and user/departmental documents. These documents consisted of Office/text/PDF files, multimedia presentations, pictures, software installs, and scripts. Some of these objects were stored in home directories and others in file shares for group access. The organization had a primary and secondary site for redundancy purposes and systems in both sites were part of the backup matrix.
Figure A
The backup system groaned under the weight of all the data and there were significant issues with the software such as failed jobs, application errors and misbehaving functions. As we worked out a resolution plan we conducted a cost analysis of the backup in case we needed to discuss alternative options. This analysis included servers, tape drives, DAS units (direct attached storage), hard disks, backup tapes, licenses, support, off-site tape storage and training.
The results were shocking. In three years the company had paid over $285K for their backup environment, which was now sputtering on life support. These costs were broken down into the following segments:
Original implementation: | $113K |
Offsite storage: | $109K |
Software support: | $24.5K |
Licenses/upgrades/training | $17K |
Backup tapes: | $13K |
Hard drives: | $6.8K |
Hardware support: | $4K |
Granted, over a third of the cost involved offsite storage, a requirement for this company to ensure data security and business continuity since they handled financial information. However, this is a cost requirement shared by many organizations and might typically be expected as part of the overall backup process.
Our analysis left out the expenses associated with the file and application servers themselves, but it’s sufficient to conclude that data was both an asset and a liability to this organization. The company depended on its data, but had spent more than the costs of all their servers put together to back it up for three years!
Organizing the data
During the analysis of the backup system we loosely categorized the data into four descriptors:
- Data which didn’t need to be backed up (operating systems on redundant servers, installation files for open source software, old project data that had been duplicated elsewhere)
- Data which only needed to be backed up once for safekeeping and then removed (former user Outlook PST files or earlier document revisions)
- Data which needed to be backed up and had to stay on local disks for performance or security considerations (SQL and Exchange databases, confidential company documents which contained personally identifiable data or files critical business operations)
- Data which needed to be backed up and could be kept locally or remotely with good security controls. (User scripts, email archive files, project spreadsheets, press releases, etc.)
Our goal was to whittle down the amount of information being backed up to optimize the environment, then look at what we could back up less often or relocate elsewhere – including cloud storage. We approached the data using the above four descriptors and adjusted the backup settings accordingly.
We used incremental backups instead of full backups where possible. We ran one-time backups of low-value data then removed it from servers. We utilized virtualization to create virtual machines from physical systems and snapshot these VMs rather than backing up the operating systems. We made sure critical information would remain available and could still be recovered using a newly-written set of business continuity guidelines. We got the total backup amount down from 10TB to 6TB and this helped the jobs run more smoothly once we corrected the other backup software problems.
Throughout this task we documented the size, backup cost, and server costs for user and email data:
Data |
Description |
Backup cost |
Server HW/OS cost |
2TB | User data – file shares/home directories | $19K/year | $1500 |
.5TB | Email databases | $4.75K/year | $3000 |
We discussed whether it was conceivable to relocate some or most of these files to the cloud to reduce local storage and backup burdens. It was determined that some information had to stay local for security purposes (see category #3 above), but the bulk of it was deemed less confidential and could thus be kept offsite under certain secure conditions (see category #4 above). It helped that the secure “must stay” documents were generally kept in two encrypted locations and users had been aware for some time of how to classify data to know where to store it (or more information on that concept I recommend reviewing the subject of data classification to see how it applies to your environment).
Considering an alternative
The backup issues were resolved and no move was made to relocate data to the cloud then. However, the company recently contacted me to revisit the possibility of moving permitted user data offsite and to look at details of a hosted email environment. They didn’t want an exhaustive analysis but rather some preliminary figures to review. They chose to begin with Google offerings since they liked Google products, users were familiar with Gmail and Google Drive, and they felt it was a trusted organization.
We looked at Google Apps for Business and determined that accounts for their 108 users would cost $5250 per year ($50 per user) which would give them email, calendars and 5GB each of Google Drive storage totaling 540GB. The individual Google Drive accounts could replace user home directories, but for shared company files we’d have to get more drive space.
The company determined that about 1.5TB of their data could be kept on Google servers, so we reviewed the business storage pricing involved. (Figure B)
Figure B
Adding 2TB of storage capacity was the best bet and would cost the company $2148 per year.
This meant this organization could expect to pay $7398 per year for hosted email and storage space on Google systems (of course, user training and updating departmental procedures costs money as well since it involves employee time and labor). The email and data would be available via local or remote means, securely accessible, searchable and backed up by Google. They would be able to access previous versions of files for up to 30 days.
Compared to the $4.5K they spent on server hardware and the $23.75K invested to back this data up (totaling $28.25K), the Google Apps possibility represented an annual cost of almost a quarter (1/4th) what they had previously paid. They could also save money on server warranty renewals and future upgrades.
Working out the details
The considerations were laid out in the following order during a one hour meeting:
Security and privacy standards had to be maintained. I’ve discussed privacy in the past since this subject is paramount to any company that wants to stay in business while operating online. Google has a page on security and privacy which spells out their policies (whether you trust them or feel safe is a separate story). Every business considering an offsite data move has to be aware of these ramifications before taking this step – and it may not be a step organizations can take depending on their compliance regulations. In this case we decided two-factor authentication was a must as well as other strong Google Apps security settings.
We needed to establish how to assign the 2TB of Google Drive space for departmental / shared folders and how to recreate the Windows file shares to present the same layout to users. These shares would continue to be centrally managed by the IT department. Each department had a series of complex network folders with selectively granted permissions. In some cases subfolders had rights removed so only key managers could view the contents – it wasn’t a simple case of “Everyone in Finance has full access to the Finance folder, everyone in HR has full access to the HR folder,” and so forth. Inheritance of permissions (whereby access granted to a top-level folder will “trickle down” to subfolders unless permissions on these subfolders have been set differently) had to work for this to proceed.
We reviewed how to move individual home directories to Google Drive. This would be the easiest part; we could install the Google Drive application on user PCs or Macs, thereby creating a local Drive folder for them to use. They could move their home directory contents into this new Drive folder, which would then synchronize to Google Servers where they could access and share items.
We decided to get in touch with Google support to explain our considerations and ask for advice on how to handle the data migration and permission/folder sharing details. Google support provided us with the following response:
Inheritance of permissions in Google Drive works as it does in Windows folders.
In order to claim the new 2TB of space we would have to assign it to a specific user. This would mean creating a “master user” who would then administer all the folder sharing and permissions.
It is not possible to migrate the existing file/folder permissions from a Windows server to Google Drive. However, we could use Google Drive SDK to create scripts to manage the permissions on the Google Drive (you can find more details here).
There are some limitations on files and folder sharing in Google Drive – you can share folders with a maximum of 200 people and there is a 10GB individual file size limit, meaning some user PST files wouldn’t be eligible to store on Google Drive.
To migrate the data we could use the Google Drive application. We would then have to define the network drive to synchronize with the Google Drive volume. To avoid network problems when synchronizing, Google support recommended moving files over in portions of 5000 or less. This would be quite a task considering the company had well over 240,000 files!
We analyzed the concept of using Google Drive SDK to set permissions for sharing files. If the scripts didn’t fit our needs, the desired permissions would need to be granted all over again unless a third party solution was involved to assist with the migration project. We found that Gladinet Cloud offers many advanced options for Google Drive including file server migration capabilities. Also, a company called Sada Systems, Inc. offers a product called “File server migration to Google Drive.”
We decided to shelve the concept for the time being due to these details. Having a master user for all data would be too inconvenient, moving the data would be tedious, and the complexities of having to redo all the permissions rendered Google’s solution inadvisable. It was just too much of a sea change. We liked the fact that security, privacy, hosted email and the relocation of user home directories to Google Drive were good solutions, but tabled the idea for the time being. My client elected to analyze the Google Drive SDK solution and third-party products mentioned above. They also decided to conduct an internal permissions audit to get a better handle on the work involved with a future data migration.
The takeaway
This story serves as a good example of the fact that, contrary to some of the whitepapers and advertising you might come across, cloud solutions aren’t an instant band-aid in all situations. You have to examine the right issues, ask the proper questions, and envision the potential pitfalls every step of the way. Sure, it’s an attractive concept to spend less money hosting your data elsewhere, but the headaches and entanglements may entail a greater cost. I do not intend to make the blanket statement that “the cloud is over-hyped and can’t deliver” – quite the contrary, this migration scenario would have worked perfectly fine for an organization with a simpler data set, such as having a few top-level network folders with permissions that could quickly be reconfigured. My point is that sometimes we can’t rush evolution; sometimes we have to proceed carefully and deliberately.
I suspect my client and I will revisit this issue at some point and find that Google Drive has evolved to match our needs, or discover that we can go forward using scripting/a third party solution. If that opportunity arises I’ll keep you posted.