Big data application developers need to navigate between different Hadoop fabrics to meet business requirements. Learn how one company is helping developers meet this need.
Software development using big data is no different than any other kind of software development. Organizations expect quick turnarounds; business requirements are rapidly changing; and IT must find ways to negotiate over multiple networks and operating systems for the plethora of different software and hardware platforms that enterprise applications traverse.
In one sense, this is initially easier in the big data world where enterprises are simply running on Hadoop, and not trying to reach out to other enterprise systems across the present "divide" that separates big data from other types of data processing. Despite this, there are still interoperability issues in this more constricted big data universe.
A fabric softener for big data?
These issues begin with the fact that there is more than one distribution of Hadoop. Hadoop service providers include Cloudera, Hortonworks, MapR Technologies, Amazon, Microsoft, Rackspace, Intel, IBM, Altiscale, Qubole, and others. Depending on which one you select, the underpinning of any application you develop will be slightly different. This won't matter much if an organization remains focused on a query-only approach to big data that sticks with languages like Hive or Pig. But if the organization is intent on developing enterprise-strength applications that run off big data, having to move between different infrastructure Hadoop fabrics matters.
"Our goal is to make it easy for developers to build data applications on top of Hadoop," said Gary Nakamura, CEO of Concurrent, which provides big data application infrastructure solutions. "The underlying structures of Hadoop can be highly complex, but if you construct an application development framework on top of it that can map to any underlying Hadoop fabric with the use of APIs (application programming interfaces), this frees the developer to focus on the layer of the application that contains the business logic."
Relieving big data application developers of underlying "fabric anxiety" gives IT flexibility in moving from one big data computational fabric to another because it no longer has to consider the tedium of application migration in its plans. In the future, this means that depending on the business need, you will be able to run a big data application in-memory, or on Apache MapReduce, or on other big data computational fabrics. Concurrent calls this, "Write once — and deploy on your fabric of choice."
Big data applications can also be adapted to changing business service level agreements (SLAs). Nakamura cites the example of an online retailer whose marketing department wants information on product sales performance every five hours, but then comes back to IT with a new request to see this information every 30 minutes. "Because big data historically only runs at one speed, the is a major challenge when it comes to writing big data applications," said Nakamura, "But with the 'write once' capability that products like Concurrent's Cascading 3.0 deliver, the application developer can focus on the intellectual property that the company wants to develop and on the data products he produces — without worrying about the underlying infrastructure."
"I'm proud to see how Cascading has enabled thousands of developers and businesses to be successful at what they do," added Chris Wensel, Concurrent's Founder and CTO. "Cascading 3.0 will enable our users even further by simplifying application development, accelerating time to market, and allowing enterprises to leverage existing, and more importantly, new and emerging data infrastructure and programming skills."
Products like this couldn't be more timely, because enterprises are expecting more from big data than they were six months ago — and full-blown application development beyond simple query capabilities is just around the corner.