Coho Data puts modern components together and uses open source innovation in its DataStream 1000. Get details about this product.
I attended Storage Field Day 4 a couple of weeks ago in San Jose, CA. One of the companies we had the pleasure of hearing from was Coho Data, a startup that will be releasing its DataStream 1000 scale-out storage product later this month. With this scale-out storage methodology, the use of monolithic storage arrays is no longer the case. Scale-out implies there will be one or most likely several nodes that contain compute resources and storage but act as one device with high availability and failover.
Ramana Jonnala, Coho Data CEO and Co-Founder, started the presentation with a brief introduction and mentioned the company obtained Series B Financing (second round). Then, he passed off the presentation to Andy Warfield, CTO and Co-Founder, who went on to wow the entire Storage Field Day panel.
The DataStream 1000 GA product will be a 2U box with 4 PCIe flash cards, 4 10 GB Ethernet ports, and 40 TB of capacity storage behind it, which we were told will come out to about $2.50/TB. The IOPs will increase linearly as you plug in more boxes and promises low-latency due to its use of an OpenFlow-like controller.
Although I just described the hardware features, Coho Data says it's really a software company running on commodity hardware…not exactly a white box, though. Coho Data decided to ship a physical appliance to ensure performance. Each unit contains two microarrays each with networking, storage, and compute. It's all NFS as well. See Figures A and B.
The networking is handled by Arista Networks 10 GB switches. Coho Data is taking advantage of many things that the Arista switch offers — in particular, the use of OpenFlow.
During the presentation, someone was asked whether you could use other switches since OpenFlow is open source; although it's possible, it's not supported at this time. This is not just software-defined networking (SDN) for SDN's sake, but more to take advantage of scaling options. They're able to isolate more "per tenant traffic" — think cloud and several organizations, companies, departments (or tenants) within the cloud.
Also, it allows them more flexibility with WAN traffic management. Warfield explained that within the OpenFlow controller we have three tables that tell packets (or more accurately flows) where to go. The first table is Layer 2, which is the MAC address layer (see the OSI model for more information). The switch sees a certain MAC and tells it which port to go to. The second table is Layer 3, the IP layer, which dose a similar thing but uses IP addresses. The third table is a Ternary Content Addressable Memory (TCAM) table, which is much more arbitrary. We can use any of the bits within the flow to direct it to a port. (Read this SDNCentral article about TCAM usage in OpenFlow.) They can use this SDN technology to initially balance workload throughout the microarrays, as well as rebalance the workload if necessary due to the addition or the removal of workloads or microarrays. The host connected only uses a single NFS IP, and the switch is able to decide which microarray it should go to, making management from the host a lot easier. You can see a demo of how this works at about 15 minutes into the Building a Flash-Based Distributed System with SDN video.
Coho Data also uses object storage, which is widely thought to be more flexible and scalable than traditional storage (block and file). Objects can be contained over several disks, and since Coho Data is replicating our VMDKs between two failure domains, we are ensured data integrity even though the data has been distributed over these various disks. See Figure C for a basic picture of the architecture. (For more information on object storage, read Enrico Signoretti's blog post Object Defined Storage.)
Coho Data is exploring in-depth analytics. When nodes are idle or just being used less, the DataStream will be able to take advantage of the resources to run traces and figure out when workload is routinely higher.
For instance, if you're re-indexing your database at 2:00 AM every morning, the DataStream could possibly pre-fetch all the data at 1:00 AM preparing it for the 2:00 AM re-indexing. As Warfield points out, it could also prepare to bring down these nightly workloads in time for VDI users to start logging in at 9:00 AM later that morning.
The UI looks pretty modern and, upon first glance, seems simple to follow. It uses all HTML5, which is awesome — no more figuring out which version of Java or Flash you need on your system. It also allows you to take per VM snapshots directly from the UI and perform snapshot scheduling if you need to routinely take snapshots. There are interactive pictures to give you information on the various components as well as help you troubleshoot. For instance, if you've wired it incorrectly, the UI will alert you. You can also tag VMs according to business unit for showback purposes.
Coho Data's presentation about DataStream was fascinating. The company is putting modern components together and using open source innovation to stay ahead of the curve. For more information, watch all of the videos from Coho Data at Storage Field Day 4.Also read: Coho Applies SDN To Scale-Out Storage by Howard Marks on Network Computing