Hyperbole is the norm when talking about how businesses can transform themselves using Internet of Things and big data. What you hear less often is just how difficult it can be to get these projects right.
After more than 10 years of working on big data and Internet of Things (IoT) programs, analytics firm Teradata has seen how much effort is required to mine useful insights from webs of interconnected sensors.
Sprinkling IoT sensors throughout your firm won't necessarily give you a snapshot of what you need to know, at least not without a lot of work, said Martin Wilcox, who leads Teradata's centre of excellence.
"It's about the sensor data, stupid," he told the Strata + Hadoop World conference in London, going on to warn:
"The data those sensors produce is an unreliable, unwilling, and in some cases downright deceitful, witness to the events we care about."
Here are the five hard truths about IoT that Wilcox said businesses need to take on board.
1. Sensors sometimes lie
"This often comes as a big surprise to business and IoT people, who tend to assume that because smart devices never come to work hungover or distracted after a row with a partner, that everything they record can be assumed to be complete, consistent and accurate," he said.
"But if you talk to the hardware engineers that maintain sensor networks, you'll discover that nothing could be further from the truth."
Given a large enough deployment of sensors, the accuracy of the data they collect will drift over time, as the hardware degrades, he said.
In harsh environments, for instance oil field sensors measuring temperature in a hot desert environment, this degradation can happen quite rapidly.
These compromised sensors can't easily be replaced "because while the sensors themselves are so cheap they're almost free, the cost of the lost production incurred in replacing them most definitely is not".
One way to counter the increasing unreliability of sensor data over time is to corroborate each sensor's data with that of its neighbours, said Wilcox, who suggested creating a "virtual sensor from a neural network of adjacent sensor readings".
"The important thing to understand is that this sensor data needs to be managed. We can't assume that machine-generated data is complete, consistent and accurate, just because it was generated by a machine."
2. Sensors can obscure the bigger picture
Sensors often sit behind machines that filter and aggregate the data they collect.
There are good reasons to ditch irrelevant data, but sometimes the data you thought was chaff later turns out to be valuable, particularly in regards to the information that can be gleaned when combining it with other data.
"It's precisely because what is noise for one application may be vitally important signals for another that at the very minimum we need to understand where and how sensor data has been summarised and filtered," he said.
"In very many cases, our ambition should be to try and capture this raw sensor data and avoid this kind of premature summarisation."
3. Adding sensors is the easiest part
Sensors typically don't measure what you're interested in, they collect data from which you can infer the information you want.
Wilcox gave the example of a fitness band that tells you about the quality of your sleep, based on your pulse and nocturnal movements.
"The wearable device on your wrist isn't directly measuring your sleep cycle. To do that it would need to be connected to electrodes that were attached to your head and measuring brainwave activity," he said.
As is the case with most IoT projects, that inference of what you really want to know depends on a computer model designed to interpret the data.
Building that model and then tuning it typically requires a lot of work up front, as well as aggregate data from very many sensors, he said.
4. Extracting useful information is anything but simple
Teradata has worked on a variety of major analytics programs that rely on sensor data. These include helping the US Army to predict failures in its helicopters and Europe's largest train operator to forecast train breakdowns.
What these projects had in common was the complex set of analytics needed to make the programs work, said Wilcox. Collecting the data was only the start.
Wilcox detailed the many analytical steps to extract meaningful information from sensor data: time-series analytics to spot significant changes in state, text analytics on engineering reports to label these significant events, path analytics to understand the sequence that led up to this outcome and graph analytics and association analytics to understand relationships between components and events.
"The process of creating a useful and useable dataset from time-series data is often anything but simple," he said.
5. By itself, sensor data is usually useless
Sensor data is only ever useful when combined with that from other sensors and information about the wider context, said Wilcox, giving an example based on the train operator Teradata worked with.
"Say an oil pressure sensor on a train temporarily exceeds a threshold value. Should we worry, or regard it as a blip?" he said.
Determining whether the train is about to breakdown requires comparing the existing data to past sensor, operations, and maintenance data to look for correlations with previous failures.
Even if the train operator could establish that the locomotive was on the verge of failure, that insight wouldn't be enough. They would also need to know whether the failure was imminent and needed fixing en-route, or could be dealt with when the train reached its destination.
Once the nature of the problem was established, the company would then need more information on where the nearest qualified engineer was, when they could get to train and whether a spare part was available.
"We can't answer any of those questions just by relating to the sensor data alone. We need the operations data, the HR data and more," he said.
"So we see that truism that 'Data loves data' is especially the case for sensor data. Failure to plan for the integration of sensor data, with other sensor data and data from around the organisation, amounts to planning to fail with your IoT initiative."
More on data and analytics...
- Healthcare IT's battle to keep sensitive data safe (TechRepublic)
- The underexploited big data sweet spot for healthcare (TechRepublic)
- 4 reasons cybersecurity now requires an analytics-driven strategy (TechRepublic)
- Quintiles, IMS Health merge in $9 billion deal, create healthcare analytics, IT giant (ZDNet)
- IBM launches blockchain cloud services for government, healthcare sectors (ZDNet)
- Power checklist: Vetting employees for security sensitive operations (Tech Pro Research)
Nick Heath is chief reporter for TechRepublic. He writes about the technology that IT decision makers need to know about, and the latest happenings in the European tech scene.