As Hadoop adoption progresses, enterprises have to view it as a platform for real-time, vitally important cloud solutions, says WANdisco's David Richards.
"The second generation of Hadoop," said WANdisco CEO David Richards, "is designed to support real-time applications, so it's important we start thinking of Hadoop as an applications platform, not just a storage system."
Consequently, as Hadoop infrastructure gets larger, Richards added that enterprises need to think carefully about service level agreements (SLAs) and continuous availability, and adjust to the new normal of their cloud applications being mission-critical.
WANdisco — which stands for Wide Area Network Distributed Computing — has locations in Sheffield, England and Silicon Valley. The company provides "enterprise-ready, non-stop" solutions for enterprises operating distributed networks. Richards cofounded the firm in 2005.
In this email Q&A, Richards also advised new startup teams on a best-of-both-worlds "hybrid" headquarters model — that is, having core ops in the Silicon Valley (which has its benefits, but it is also highly competitive), and locating other parts of the firm somewhere else.
TechRepublic: What do enterprises most need to know about Hadoop adoption?
David Richards: Hadoop version 1.0 was all about application processing to support use cases around batch processing. The second generation of Hadoop is designed to support real-time applications, so it's important we start thinking of Hadoop as an applications platform, not just a storage system. Previously, Hadoop deployment was pretty limited to lab scenarios and small-scale deployment, but lately we are seeing more and more companies move into production with very large deployments. Spark running on Hadoop seems to be the platform of choice for most deployments.
TechRepublic: What are the major trends in your competitive space: continuous-availability enterprise applications?
David Richards: As big data/Hadoop storage infrastructure moves from small lab environments to large-scale production, customers have to think about SLAs for their core applications, meaning continuous availability and recovery becomes critical. Secondly, we are seeing a major trend of applications moving from behind the firewall into the cloud, meaning cloud migration and hybrid cloud solutions are becoming critical — particularly when it comes to active-active use cases.
WANdisco offers guaranteed SLAs and is the only vendor that provides the technology needed to support these types of use cases.
TechRepublic: What are the biggest pitfalls for companies trying to manage distributed networks?
David Richards: Distributed systems/networks tend to make assumptions about the reliability of networks, which isn't helpful given that wide area networks (WAN) are not 100% reliable. Applications built using that basic assumption are highly likely to experience major challenges, if not fail. Fortunately for our customers, WANdisco's network, algorithms, and patents are designed for fault tolerance and failure in mind. We replicate data over a network where we do not assume complete reliability.
TechRepublic: Looking at WANdisco's active-active replication technology, how was it built, and what can it do?
David Richards: Our patented active-active replication technology was imagined by our Chief Scientist, Inventor, and Cofounder Dr. Yeturu Aahlad, a worldwide authority on distributed computing. It was Dr. Aahlad's vision and persistence that led to the invention of active-active replication technology, a technology many thought was impossible at the time.
Our core patents are a way in which companies can achieve true active-active WAN scope replication. The acid test of active-active replication is whether companies can make changes (write) to every site. The answer is categorically "yes" with WANdisco's technology. Other technologies that claim to be active-active do not facilitate rights everywhere. For example, the closest technology to WANdisco's has a limitation of a certain metropolitan area because it has network latency issues with the speed at which data can travel.
TechRepublic: What business need were you pursuing with the August release of WANdisco Fusion for Hadoop?
David Richards: We rolled out WANdisco Fusion for Hadoop because the product eliminates the need for expensive "bump-in-the-wire" network optimization hardware and software that can add hundreds of thousands of dollars of cost with significant complexity. These cost savings will be on top of the major hardware cost savings WANdisco Fusion brings to Hadoop deployments.
With WANdisco Fusion's active-active architecture, all servers and clusters are fully readable and writeable, always in sync and recover automatically from each other after planned or unplanned downtime. There are no passive read-only backup servers and clusters that are only utilized when the primary active cluster goes offline. As a result, WANdisco Fusion customers get 100% use of their hardware, without wasting a significant amount of their hardware budget on idle backup servers.
WANdisco Fusion's features are critical for enterprises looking to scale-up large Hadoop deployments. Customers have told us they are seeing significant ROI, with savings in hardware costs alone on the order of 50%. And while ease-of-use generally isn't the first thing that comes to mind when one thinks about big data, WANdisco Fusion simplifies the Hadoop experience in a way that is truly unique in this space.
In September, we also announced WANdisco Fusion achieved Oracle Big Data Appliance Optimized status through Oracle's PartnerNetwork (OPN) and in October, we shared that that a leading 24/7 provider of financial data, breaking news, and expert analysis on business sectors critical to the global economy, will use WANdisco Fusion in concert with the Hortonworks Data Platform.
TechRepublic: A Sheffield, UK native and a Silicon Valley success—I'm not the first interviewer to mention your bio. Looking back, what was the biggest reason you chose to pursue your career in the Valley?
David Richards: I started the business in Silicon Valley because it's simply the only place to be. Silicon Valley is a magnet for some of the greatest minds in the industry, such as Dr. Aahlad. I believe he is one of the top two to three distributed computing experts in the world. But, it is worth remembering that half of WANdisco—and a core part of the team—is based in Sheffield.
TechRepublic: On that note, if you met a young startup team trying to decide whether to locate in Silicon Valley, what would you say to them?
David Richards: Silicon Valley has a lot of benefits, such as brilliant minds and access to venture capital. There are many success stories here, and being based in Silicon Valley gives many startups immediate credibility. But to really scale a business, some of the most successful startups operate with a hybrid model: that is, they have core operations in Silicon Valley, but also operate parts of their business in other geographic locations. This can prove very effective, as Silicon Valley is incredibly competitive.