Software development using big data is no different than any other kind of software development. Organizations expect quick turnarounds; business requirements are rapidly changing; and IT must find ways to negotiate over multiple networks and operating systems for the plethora of different software and hardware platforms that enterprise applications traverse.
In one sense, this is initially easier in the big data world where enterprises are simply running on Hadoop, and not trying to reach out to other enterprise systems across the present "divide" that separates big data from other types of data processing. Despite this, there are still interoperability issues in this more constricted big data universe.
A fabric softener for big data?
These issues begin with the fact that there is more than one distribution of Hadoop. Hadoop service providers include Cloudera, Hortonworks, MapR Technologies, Amazon, Microsoft, Rackspace, Intel, IBM, Altiscale, Qubole, and others. Depending on which one you select, the underpinning of any application you develop will be slightly different. This won't matter much if an organization remains focused on a query-only approach to big data that sticks with languages like Hive or Pig. But if the organization is intent on developing enterprise-strength applications that run off big data, having to move between different infrastructure Hadoop fabrics matters.
"Our goal is to make it easy for developers to build data applications on top of Hadoop," said Gary Nakamura, CEO of Concurrent, which provides big data application infrastructure solutions. "The underlying structures of Hadoop can be highly complex, but if you construct an application development framework on top of it that can map to any underlying Hadoop fabric with the use of APIs (application programming interfaces), this frees the developer to focus on the layer of the application that contains the business logic."
Relieving big data application developers of underlying "fabric anxiety" gives IT flexibility in moving from one big data computational fabric to another because it no longer has to consider the tedium of application migration in its plans. In the future, this means that depending on the business need, you will be able to run a big data application in-memory, or on Apache MapReduce, or on other big data computational fabrics. Concurrent calls this, "Write once — and deploy on your fabric of choice."
Big data applications can also be adapted to changing business service level agreements (SLAs). Nakamura cites the example of an online retailer whose marketing department wants information on product sales performance every five hours, but then comes back to IT with a new request to see this information every 30 minutes. "Because big data historically only runs at one speed, the is a major challenge when it comes to writing big data applications," said Nakamura, "But with the 'write once' capability that products like Concurrent's Cascading 3.0 deliver, the application developer can focus on the intellectual property that the company wants to develop and on the data products he produces — without worrying about the underlying infrastructure."
"I'm proud to see how Cascading has enabled thousands of developers and businesses to be successful at what they do," added Chris Wensel, Concurrent's Founder and CTO. "Cascading 3.0 will enable our users even further by simplifying application development, accelerating time to market, and allowing enterprises to leverage existing, and more importantly, new and emerging data infrastructure and programming skills."
Products like this couldn't be more timely, because enterprises are expecting more from big data than they were six months ago — and full-blown application development beyond simple query capabilities is just around the corner.
- Does open source matter to Hadoop?
- Intel and Cloudera: Why we're better together for Hadoop
- Red Hat pledges allegiance to Hortonworks in the battle of Hadoop ecosystems
- Hadoop's ability to deliver business growth is worth the bother
- Zettaset closes Hadoop's enterprise big data security gap
Mary E. Shacklett is president of Transworld Data, a technology research and market development firm. Prior to founding the company, Mary was Senior Vice President of Marketing and Technology at TCCU, Inc., a financial services firm; Vice President of Product Research and Software Development for Summit Information Systems, a computer software company; and Vice President of Strategic Planning and Technology at FSI International, a multinational manufacturing company in the semiconductor industry. Mary is a keynote speaker and has more than 1,000 articles, research studies, and technology publications in print.