Seth Miller demonstrates why its crucial to gather all the relevant data about how your systems are used before moving anything to the cloud.
By Seth Miller, Miller Systems
Using cloud based file storage, sync, and collaboration systems is very popular, and a great fit for a lot of organizations of all sorts and sizes. Here’s the catch, though: If you’re in an environment where users are doing more than email and basic office apps, the cloud doesn’t always make sense – and evaluating the viability of a move to cloud can be very difficult.
Let’s bring this a bit closer to the ground. You’re in IT for a group working in media production.
- You may have as few as 40 or 50 total users – but you wouldn’t know it from looking in the server room. That probably looks more like a “typical business” with 250 or more users. Dozens or hundreds of terabytes, lots of servers (physical or virtual, doesn’t matter).
- You're dealing with big files (video, high fidelity audio, high res images, etc).
- You’re not 100% Windows at the desktop. You've got a split of Macs and Windows, maybe even a few Linux boxes.
- You’ve had SANs/NAS/large DAS RAIDs for many years by the time 2013 rolled around. When GigE LAN became standard, productivity soared.
- Let’s assume you don’t have a multi-gigabit internet connection, for sake of discussion.
My friend, if this all sounds familiar, you are in a textbook high-performance LAN environment. Assuming you don’t have a multi-gigabit connection to the internet, you may have a real problem moving to the cloud.
So what do you do when the CxO says, “I need you to move all our systems to the cloud right away”?
On paper, cloud looks great. You're going to save a bundle of money, have less headaches to deal with, less systems to manage, etc. But you suspect that if you move your systems and data that run at LAN speeds to the cloud, productivity could grind to a proverbial halt. But at some point, you may need to prove it to senior management, and, “I’ve got a bad feeling about this,” probably won’t cut it, Han. Besides, don’t you really want to know for sure before putting up a fight?
You are going to need some data. High performance LAN environments are rarely, if ever, designed to perform well over slow links, and it’s rarely analyzed as a result. Of course there’s that other classic IT occupational hazard: The way we think systems get used and the way they actually get used doesn’t always match, right?
Our team recently went through an exercise like this; here’s how we went from “pretty sure” to “confident and informed” about a client’s desire to move systems to the cloud.
Zeroing in on what you really need to know and understand
The primary application and source of LAN traffic in this case was Windows Server SMB file sharing, with about a 50/50 split of Mac and Windows clients. We knew that if we could reliably capture the following data points, from every server on the network, on every file operation (CRUD) that was performed, we'd be able to analyze just about anything we'd conceivably want:
- Name of accessing user/AD account
- Accessing system name and/or IP
- Date and Timestamp
- File Size
- Full path to file
This might look like a small number of data points to capture, but in our case, over the course of about a month, there were over 10 million file operations that occurred across about 20 servers. If you are reasonably handy with Excel, Access, SQL Server, etc., that's all the data you need. It was enough for our team on a recent project like this.
Analyzing a large sample is essential
Regular but infrequent flurries of high-traffic activity often represent the most time-sensitive moments (like a monthly production deadline or a product release). If you don’t capture several weeks or months of traffic, you might miss them. The table below shows us lots of excellent data points; the size of the largest individual file transferred; the frequency of individual transfers over 20MB; the total number of operations, and the total data transferred. But look how different one day or week can be from the next.
Slice, dice, and DON’T over-aggregate
It’s really dangerous to rely on averages for the entire period captured. Note in the example above how downwardly skewed the averages are compared to the highs and lows.
Here’s a better example: Each row in the heat map below represents a day; each column is the hour of day (0-23). The cells and totals represent the total amount of data transferred in MB between all of the users and servers over the course of a month.
You don’t need to be a “big data expert” to figure out that most traffic occurs around 9-5 M-F at this place, but let’s look a little closer. Using this data set, look how badly misled we would be if we planned around the following:
Yep, that’s right, there were nearly 50GB of files transferred between 7 and 9pm on our busiest day – 135 times the average of all other days during the same period. “Outliers” in this type of analysis can frequently be the most essential data points.
Traps (and a few tips)
“Let’s just move some of it to the cloud and see what happens” (aka, spaghetti testing)
If you don’t understand the interactions between the users and the systems, and assuming that we don’t have a LAN-speed connection between your users and the cloud servers, you could really be in for a bad time. Think about how bad the night in our “7-9pm” example would be if it happened to occur the week after you moved your servers to the cloud. And how long would that take you to roll everything back to on premises if it fails? Yikes! You might just be out of a job.
High frequency vs. large files
We found that a high frequency of file operations can be just as bad – or in some cases, a lot worse than having a large aggregate amount of data to transfer. One example we saw in testing: It took almost the same amount of time for us to copy a 75MB folder full of just over 1300 files than it did to copy a single 2GB file to the same cloud server on the same network. If your users use “semi-automated” build or batch file scripts to do their work this can be a very serious issue.
“Let’s just move the desktops to the cloud too”
That may not be simple, easy, desirable, or cost effective. Not too many people out there are editing video or working in Photoshop via some sort of terminal services solution. What about the Macs? There are niche solutions out there, but they’re not cheap, mature, or simple to deploy and manage. Hey, wasn’t “cheap and simple to manage” supposed to be the point of moving to the cloud in the first place? Great example of why…
Dogma is dangerous: Don’t forget the business
Remember to sanity check against business goals behind a process like this. A dogmatic agenda of “everything must move to the cloud” can lead to over-engineering - and negative ROI.
Seth Miller founded Miller
Systems in 1995. The firm focuses on human beings first and technology
second, helping organizations to improve and manage their web sites, intranets,
portals, and day to day IT operations.