Data analysts know the problem all too well: Two sets of data need to be merged but they're formatted completely differently. Getting them to play well together ends up taking a bunch of extra time—and money.
Imagine if all those datasets were formatted the same way. Then imagine that they were all housed in one central repository that allowed anyone in a company to pull, filter, and analyze data from other departments.
SEE: Big data and IoT matter to 56% of organizations (Tech Pro Research)
Big data isn't going away, and the currently fractured landscape is going to have to come together to truly make progress, which is exactly what IBM is hoping to accomplish with its newly announced Project DataWorks.
What is it?
Project DataWorks is the newest part of IBM's cloud platform, Bluemix. It uses Apache Spark's machine learning capabilities to perform a lot of the heavy lifting automatically in the cloud. IBM further claims DataWorks can process data from all possible endpoints, such as IoT devices, weather, social media, and enterprise databases.
Cognitive-assisted machine learning technology also helps DataWorks process information faster—reportedly up to speeds in the hundreds of Gbps. With that kind of speed businesses should be able to get the answers they need from data in record time.
SEE: IBM ups enterprise appeal of Swift with new Bluemix runtime (TechRepublic)
One system to rule them all?
DataWorks isn't just designed to be fast: Its core purpose is giving big data a universal format. In an IBM world data silos, mismatched fields, and ungoverned data would be a thing of the past.
DataWorks was given a trial run at RSG Media, an analytics company serving the media and entertainment industries. According to IBM, RSG was able to leverage DataWorks efficiently enough to save a single network client $50 million, all through greater efficiency in analyzing audience preference data.
Bob Picciano, senior VP at IBM Analytics, thinks big data is at an inflection point. "Users spend up to 80 percent of their time on data preparation, no matter the task, even when they are applying the most sophisticated AI." The reason for that prep time? No standard platform.
SEE: 6 myths about big data (TechRepublic)
If IBM's claim of faster processing from all possible endpoints is true that 80 percent figure could shrink dramatically, allowing analysts and business professionals to get results far quicker than before.
Faster insights could, in turn, result in businesses being more agile. We're in a data-driven world now, and the amount of data in it is only going to grow. Industry standardization is needed if businesses are going to make the most of it going forward, and IBM just might be taking that first step.
The 3 big takeaways for TechRepublic readers
- IBM announced its new DataWorks platform, a big data analysis system that plans to bring all sorts of data together in one standard format.
- DataWorks uses Apache Spark and Watson's AI machine learning systems to analyze input data at speeds in the hundreds of Gbps, faster than any other platform available.
- DataWorks can reportedly analyze data from any possible endpoint: enterprise databases, IoT hardware, social media, and other systems can all be merged, compared, and analyzed.
- Big data booming, fueled by Hadoop and NoSQL adoption (TechRepublic)
- Big data's biggest problem: It's too hard to get the data in (ZDNet)
- How your big data career is killing other jobs (TechRepublic)
- Getting results from big data analytics, without big upfront costs (ZDNet)
- The "big data" app that predicts employees' health (CBS News)
Brandon Vigliarolo has nothing to disclose. He does not hold investments in the technology companies he covers.
Brandon writes about apps and software for TechRepublic. He's an award-winning feature writer who previously worked as an IT professional and served as an MP in the US Army.