I attended Storage
Field Day 4 a couple of weeks ago in San Jose, CA. One of the companies we
had the pleasure of hearing from was Coho
Data, a startup that will be releasing its DataStream 1000 scale-out storage
product later this month. With this scale-out storage methodology, the use
of monolithic storage arrays is no longer the case. Scale-out implies there
will be one or most likely several nodes that contain compute resources and
storage but act as one device with high availability and failover.
Ramana Jonnala, Coho Data CEO and Co-Founder, started the presentation
with a brief introduction and mentioned the company obtained Series B Financing
(second round). Then, he passed off the presentation to Andy Warfield, CTO and
Co-Founder, who went on to wow the entire Storage Field Day panel.
The DataStream 1000 GA product will be a 2U box with 4 PCIe
flash cards, 4 10 GB Ethernet ports, and 40 TB of capacity storage behind it,
which we were told will come out to about $2.50/TB. The IOPs will increase
linearly as you plug in more boxes and promises low-latency due to its use of
Although I just described the hardware features, Coho Data
says it’s really a software company running on commodity hardware…not exactly a
white box, though. Coho Data decided to ship a physical appliance to ensure
performance. Each unit contains two microarrays each with networking, storage,
and compute. It’s all NFS as well. See Figures A and B.
The networking is handled by Arista Networks 10 GB switches. Coho Data
is taking advantage of many things that the Arista switch offers — in
particular, the use of OpenFlow.
During the presentation, someone was asked whether you could
use other switches since OpenFlow is open source; although it’s possible, it’s
not supported at this time. This is not just software-defined
networking (SDN) for SDN’s sake, but more to take advantage of scaling
options. They’re able to isolate more “per tenant traffic” — think
cloud and several organizations, companies, departments (or tenants) within the
Also, it allows them more flexibility with WAN traffic
management. Warfield explained that within the OpenFlow controller we have
three tables that tell packets (or more accurately flows) where to go. The
first table is Layer 2, which is the MAC address layer (see the OSI model for more information).
The switch sees a certain MAC and tells it which port to go to. The second
table is Layer 3, the IP layer, which dose a similar thing but uses IP
addresses. The third table is a Ternary Content Addressable Memory (TCAM)
table, which is much more arbitrary. We can use any of the bits within the flow
to direct it to a port. (Read
this SDNCentral article about TCAM usage in OpenFlow.) They can use this
SDN technology to initially balance workload throughout the microarrays, as
well as rebalance the workload if necessary due to the addition or the removal
of workloads or microarrays. The host connected only uses a single NFS IP, and
the switch is able to decide which microarray it should go to, making
management from the host a lot easier. You can see a demo of how this works at
about 15 minutes into the Building
a Flash-Based Distributed System with SDN video.
Coho Data also uses object storage, which is widely thought
to be more flexible and scalable than traditional storage (block and file). Objects
can be contained over several disks, and since Coho Data is replicating our
VMDKs between two failure domains, we are ensured data integrity even though
the data has been distributed over these various disks. See Figure C for a
basic picture of the architecture. (For more information on object storage, read
Enrico Signoretti’s blog post Object Defined Storage.)
Coho Data is exploring in-depth analytics. When nodes are
idle or just being used less, the DataStream will be able to take advantage of
the resources to run traces and figure out when workload is routinely higher.
For instance, if you’re re-indexing your database at 2:00 AM
every morning, the DataStream could possibly pre-fetch all the data at 1:00 AM
preparing it for the 2:00 AM re-indexing. As Warfield points out, it could also
prepare to bring down these nightly workloads in time for VDI users to start
logging in at 9:00 AM later that morning.
The UI looks pretty modern and, upon first glance, seems
simple to follow. It uses all HTML5, which is awesome — no more figuring out
which version of Java or Flash you need on your system. It also allows you to
take per VM snapshots directly from the UI and perform snapshot scheduling if
you need to routinely take snapshots. There are interactive pictures to give
you information on the various components as well as help you troubleshoot. For
instance, if you’ve wired it incorrectly, the UI will alert you. You can also
tag VMs according to business unit for showback
Coho Data’s presentation about DataStream was fascinating.
The company is putting modern components together and using open source
innovation to stay ahead of the curve. For more information, watch all of the videos
from Coho Data at Storage Field Day 4.
Applies SDN To Scale-Out Storage by Howard Marks on Network Computing