While “big data” can be a misunderstood buzzword in tech, there’s no denying that the recent AI and machine learning push is dependent on the labeling and synthesis of huge amounts of training data. A new trend report by advisory firm Ovum predicts that the big data market–currently at $1.7 billion–will swell to $9.4 billion by 2020.
So what do data insiders see happening in the coming year? TechRepublic spoke to several leaders in this field to find out.
Here are five big data trends to watch in 2017, from the experts.
1. AI and machine learning will increase the need for for big data analytics
There’s no question that the AI boom depends on data labeling and analysis. “Machine learning has really come along,” said Carla Gentry, a data scientist in Louisville, KY. “2017 will be the year we see more expertise, but still it will struggle, with understanding, proper usage and talent.”
“IoT on the other hand, will surge with toys, car accessories, home and security uses but it will also set up nasty hackers with lots more access to our private lives,” Gentry said.
Monte Zweben, co-founder and CEO of Splice Machine, has a background in AI. “AI applications powered by machine learning depend on data to develop more predictive models,” said Zweben. “The more data, and, even more importantly, more data that represents the concepts you need to learn, makes AI applications better.
For example, said Zweben, “the more electronic medical records a system sees that reflect dangerous sepsis events in hospitals, the better a system can predict them before they happen.”
Big data, according to Tony Baer, principal analyst for information management at Ovum, “has emerged from its infancy to transition from buzzword to urgency for enterprises across all major sectors.”
“The growing pains are being abetted by machine learning, which will lower barriers to adoption of big data-enabled analytics and solutions,” said Baer, “and the growing dominance of the cloud, which will ease deployment hurdles.”
SEE: 10 big data insiders to follow on Twitter (TechRepublic)
2. Self-service big data tools hitting the web
With advances in data processing and cloud applications, there is a plethora of free data platforms online that make organizing and synthesizing data easy–even for beginners.
“Every platform is becoming cloud-available,” said Zweben. “Even big data platforms like Splice Machine are available now as a self-service platform. You specify how much storage and compute you need and databases appear in the cloud for both your apps and data warehouses to use in minutes. There are no wires, racks, networks, or servers to configure,” he said.
Michael Cavaretta, director of analytics infrastructure at Ford Motor Company, said he also sees this as a trend that will continue in 2017.
“Cloud implementations of big data are increasing in popularity as it drives down the entry cost for these technologies,” Cavaretta said. “For many, building a big data stack just isn’t cost effective–particularly for startups–and works best when the majority of the data can be hosted on a single instance.”
3. Analytics are still struggling to keep up
But even with all the great tools and data warehouses, analytics remain complicated. “Even with giant data warehouses now available on Big Data like Hadoop and Spark, companies still struggle to transfer data from operational systems to analytical systems,” said Zweben. “that gap and enable the seamless combination of both workloads.”
“Analytics will always struggle to keep up,” said Cavaretta. “As more data and better algorithms become available, more automation is possible along with better predictions. As the methods disseminate, they become the cost of doing business, driving more analytic innovation.”
SEE: Job description: Big data modeler (Tech Pro Research)
4. Data cleansing becoming an industry
In order to get training data into machine learning systems, it must first be cleansed, which means making sure that the information in a database has been checked for errors in format, duplications, etcetera. “Machine learning systems are only as good as the data they train on,” said Zweben, “and the secret is transforming raw operational data into learnable features.” The fact that someone visited an online shoe retailer, for instance, “is useful,” he said. “But knowing they went there today is invaluable.”
5. Democratization of data
Jim Adler, head of data at the Toyota Research Institute, has previously talked to TechRepublic about how data doesn’t live in lakes. Rather, “it lives in silos where accountability, focus, and mission are clear,” said Adler. “Server-less, micro-service architectures are making it increasingly easy for these silo-owners to access, analyze, and manage their data without racking servers, configuring virtual machines, or even paying by the hour. Going serverless allows data owners to focus on their data application and pay just for what they use–by the minute.”