Big Data optimize

The next big thing in big data: Plug-and-play analytics

As more companies look to analytics and data-mining models to extract useful information from big data, a better way is needed to share these models between applications.

While there are well-established tools for building analytics and data-mining models to help businesses spot fraudulent transactions or recommend follow-up purchases to customers, plugging these models into applications can be a painful process.

As more businesses call upon these models to interrogate increasingly large datasets, it will become necessary to have an easy way to export and share these models between applications.

Sean Owen, director of data science at Hadoop specialist Cloudera, expects the next big growth area in big data will be in tools that make it simpler to share these models between applications.

"It seems to be the common problem, the wheel that keeps getting reinvented by customers," he said.

"The default thing to do is someone makes a model in [the statistical modelling language] R and they say 'Here's a bunch of coefficients, go program this into some Java code and use this on the website'.

"That requires some expertise on behalf of the developer too, it's very manual.

"They need something that the web service can ask in some standard simple way 'Here's a new data point, classify it for me'."

One candidate for a standardised way to share these models is the Predictive Model Markup Language (PMML) – an XML-based language for representing data mining and statistical models .

PMML can represent not only the statistical techniques used to learn patterns from data, such as artificial neural networks and decision trees, but also pre-processing of raw input data and post-processing of the model output.

A wide range of data mining tools can import or export models as PMML, and the standard itself is developed by the Data Mining Group, a vendor-led consortiums whose members include IBM, MicroStrategy, SAS and SPSS.

Developing a standard way of representing and interacting with these models would be a "big deal" in the coming year said Owen.

"You would think there would be a server for this and there really isn't. SAS has an expensive proprietary tool that does that and there's one open source package that kind of does it," he said.

"If I've got a model, surely I should be able to load it up in something and then query it with standard APIs and client libraries? We need to standardise and have a suite of mature solutions to do this."

About

Nick Heath is chief reporter for TechRepublic UK. He writes about the technology that IT-decision makers need to know about, and the latest happenings in the European tech scene.

1 comments
Michael Zeller
Michael Zeller

Excellent article, absolutely agree that we need a common standard and process for the deployment and integration of advanced predictive analytics.  Eliminating the need for custom code, the PMML industry standard is the right way to enable vendor-neutral, cross platform capabilities.

To add to the above, Zementis already offers several solutions for the rapid deployment of PMML models in context of big data and real-time applications.  Using standard APIs, our ADAPA scoring engine is available on the AWS Cloud Marketplace or for in-house deployment.  If you prefer a SQL-based approach, our Universal PMML Plug-in (UPPI) supports model execution on Hadoop (Hive and Datameer), IBM PureData (Netezza), Pivotal Greenplum, Teradata & Aster, and SAP Sybase IQ.

I invite you to check out more of the PMML benefits at http://www.zementis.com