Cloud-based storage startup ClearSky is making a bid to replace your on-premises storage array. Keith Townsend breaks down his concerns about the approach.
In my interview with the company's CEO and cofounder Ellen Rubin, she proposed replacing customers' primary storage with ClearSky's cloud-based solution. She isn't talking object storage such as Amazon S3, but rather replacing low-latency SAN arrays such as EMC VNX or Tintri arrays with cloud-hosted storage.
When I heard the pitch, my initial temptation was to dismiss the solution as not only technically challenging but risky. But, with storage veterans such as Paula Long, cofounder of EqualLogic and DataGravity, on the board of directors of ClearSky, I figured I needed to take the company seriously.
My two main reservations
It's important to explain why I was initially dismissive of not only the premise of the technology but the applicability of the solution.
From a technical perspective, there's a reason SAN arrays exist and why they reside on-premises. The primary challenge is latency. As Rubin and her team are quick to point out, you can't solve the problem of the speed of light. Any cloud-based solution would need to solve the problem of the time it takes to make a round-trip from a customer's data center to the provider's data center.
Data center operators measure connectivity to on-premises SAN arrays latency in the microsecond-to-low millisecond range. Typical latency from your application server to storage would be 2ms. A good connection to your cloud provider is typically 70ms. 70ms latency may work for some cloud-native apps, but hosting your Exchange Server on this type of latency is a non-starter.
While providers have made headway in convincing larger enterprises to move services to the cloud, primary storage is a common resistant silo to change. I've had enough trouble trying to convince customers to move from Fibre Channel to iSCSI or to abandon monolithic arrays for Server SAN architecture, so the idea that the same companies would be willing to outsource their primary storage to a cloud provider was difficult to accept.
Details about ClearSky's solution
ClearSky takes a reasonable approach to solving a good portion of the latency challenges. ClearSky leverages a three-layer approach to the storage system's design. The first layer is a cache device that provides low-latency access to "hot" or frequently accessed data. The cache device, co-located in the customer's data center, is an all-flash dual-system board appliance. The appliance doubles as an access gateway to the offsite data. The WAN connections (dual 1Gbps circuits) are provided and managed by ClearSky. The connectivity is to a point of presence (POP) co-location site. At the co-location is a 2nd level of storage that holds a 2nd tier of less frequently accessed data. From the POP is connectivity to ClearSky's cloud provider that hosts the cold data.
With the cache layer in place, users shouldn't notice a performance difference in latency between a traditional SAN array and the ClearSky solution; it's when the client needs to access cold data that there are potential latency concerns. ClearSky claims the combination of cache and low latency to the POP provides comparable performance to a traditional on-premises array.
From a customer acceptance perspective, Rubin understands there are customers for whom hosting primary storage offsite is not an option. The target customer is an environment large enough to have dedicated storage arrays but no dedicated storage team. The target customer would also be open to public cloud technologies. With the wider acceptance of solutions such Office 365, I find the premise that there's a market for hosted primary storage plausible.
ClearSky is offering the typical data services you'd expect in a mid-tier storage array. Features from thin-provision and deduplication offer space savings. Backup and recovery is supported with snapshot and data replication options. ClearSky has a clean looking interface that may appeal to the targeted market segment.
I can think of interesting use cases for ClearSky. One is the recent trend to deploy multiple Tier-1 data centers within a metro area. By clustering multiple Tier-1 data centers, data managers can provide Tier-3 or even Tier-4 level availability at a lower aggregate cost. ClearSky's POP-dependent architecture makes it ideal for these metro-level data center designs.
While my initial skepticism has been relieved, I'm still in a wait-and-see state for some of my performance and reliability concerns. As ClearSky takes on additional customers, the gaps in design will appear. It's an interesting solution that not only relieves the burden of managing storage but provides additional flexibility in data center design.
I'd love to hear your thoughts and questions about cloud-hosted primary storage. Please post them in the comments.