By Seth Miller, Miller Systems
Using cloud based file storage, sync, and
collaboration systems is very popular, and a great fit for a lot of
organizations of all sorts and sizes. Here’s the catch, though: If you’re in an
environment where users are doing more than email and basic office apps, the cloud
doesn’t always make sense – and evaluating the viability of a move to cloud can
be very difficult.
Let’s bring this a bit closer to the
ground. You’re in IT for a group working in media production.
- You may have as few as 40 or 50 total
users – but you wouldn’t know it from looking in the server room. That probably
looks more like a “typical business” with 250 or more users. Dozens or hundreds
of terabytes, lots of servers (physical or virtual, doesn’t matter).
- You’re dealing with big files (video, high fidelity audio,
high res images, etc).
- You’re not 100% Windows at the
desktop. You’ve got a split of Macs and Windows, maybe even a few Linux boxes.
- You’ve had SANs/NAS/large DAS RAIDs
for many years by the time 2013 rolled around. When GigE LAN became standard,
- Let’s assume you don’t have a
multi-gigabit internet connection, for sake of discussion.
My friend, if this all sounds
familiar, you are in a textbook high-performance
LAN environment. Assuming you don’t have a multi-gigabit connection to the
internet, you may have a real problem moving to the cloud.
So what do you do when the CxO says, “I need you to move all our systems to the
cloud right away”?
On paper, cloud looks great. You’re
going to save a bundle of money, have less headaches to deal with, less systems
to manage, etc. But you suspect that if you move your systems and data
that run at LAN speeds to the cloud, productivity could grind to a proverbial
halt. But at some point, you may need to prove it to senior management, and, “I’ve
got a bad feeling about this,” probably won’t cut it, Han. Besides, don’t you really want to know for sure before
putting up a fight?
You are going to need some data. High
performance LAN environments are rarely, if ever, designed to perform well over
slow links, and it’s rarely analyzed as a result. Of course there’s that other
classic IT occupational hazard: The way we think
systems get used and the way they actually
get used doesn’t always match, right?
Our team recently went through an
exercise like this; here’s how we went from “pretty sure” to “confident and
informed” about a client’s desire to move systems to the cloud.
Zeroing in on what you really need
to know and understand
The primary application and source of
LAN traffic in this case was Windows Server SMB file sharing,
with about a 50/50 split of Mac and Windows clients. We knew that if we could reliably
capture the following data points, from every server on the network, on every
file operation (CRUD) that was performed, we’d be able to analyze just about
anything we’d conceivably want:
- Name of accessing user/AD account
- Accessing system name and/or IP
- Date and Timestamp
- File Size
- Full path to file
might look like a small number of data points to capture, but in our case, over
the course of about a month, there were over 10 million file operations that
occurred across about 20 servers. If you are reasonably handy with
Excel, Access, SQL Server, etc., that’s all the data you need. It was enough
for our team on a recent project like this.
Analyzing a large sample is essential
infrequent flurries of high-traffic activity often represent the most
time-sensitive moments (like a monthly production deadline or a product
release). If you don’t capture several weeks or months of traffic, you might
miss them. The table below shows us lots of excellent data points; the size of
the largest individual file transferred; the frequency of individual transfers
over 20MB; the total number of operations, and the total data transferred. But
look how different one day or week can be from the next.
Slice, dice, and DON’T over-aggregate
It’s really dangerous to rely on
averages for the entire period captured. Note in the example above how downwardly
skewed the averages are compared to the highs and lows.
Here’s a better example: Each row in
the heat map below represents a day; each column is the hour of day (0-23). The
cells and totals represent the total amount of data transferred in MB between
all of the users and servers over the course of a month.
You don’t need to be a “big data
expert” to figure out that most traffic occurs around 9-5 M-F at this place, but
let’s look a little closer. Using this data set, look how badly misled we would
be if we planned around the following:
Yep, that’s right, there were nearly
50GB of files transferred between 7 and 9pm on our busiest day – 135
times the average of all other days during the same period. “Outliers” in this type of analysis can
frequently be the most essential data points.
Traps (and a few tips)
“Let’s just move some of it to the cloud and see what happens” (aka, spaghetti testing)
If you don’t understand the
interactions between the users and the systems, and assuming that we don’t have
a LAN-speed connection between your users and the cloud servers, you could
really be in for a bad time. Think about how bad the night in our “7-9pm” example
would be if it happened to occur the week after you moved your servers to the
cloud. And how long would that take you to roll everything back to on premises
if it fails? Yikes! You might just be out of a job.
High frequency vs. large files
We found that a high frequency of file
operations can be just as bad – or in some cases, a lot worse than having a
large aggregate amount of data to transfer. One example we saw in testing: It
took almost the same amount of time for us to copy a 75MB folder full of just
over 1300 files than it did to copy a single 2GB file to the same cloud server
on the same network. If your users use “semi-automated” build or batch file
scripts to do their work this can be a very serious issue.
“Let’s just move the desktops to the cloud too”
That may not be simple, easy, desirable,
or cost effective. Not too many people out there are editing video or working
in Photoshop via some sort of terminal
services solution. What about the Macs? There are niche solutions out there, but they’re not cheap, mature, or
simple to deploy and manage. Hey, wasn’t “cheap and simple to manage” supposed
to be the point of moving to the cloud in the first place? Great example of
Dogma is dangerous: Don’t forget the business
sanity check against business goals behind a process like this. A dogmatic
agenda of “everything must move to the cloud” can lead to over-engineering –
and negative ROI.
Seth Miller founded Miller
Systems in 1995. The firm focuses on human beings first and technology
second, helping organizations to improve and manage their web sites, intranets,
portals, and day to day IT operations.