Big Data

Big data is now economics and not just technology

Bill Schmarzo, the chief technology officer of IoT and analytics at Hitachi Vantara, sits down with TechRepublic's Tonya Hall to talk about how cleaning the right data using machine learning can yield a positive ROI in your AI.

Hitachi Vantara CTO Bill Schmarzo tells TechRepublic's Tonya Hall all about how cleaning the right data using machine learning can yield a positive ROI in your AI. The following is an edited transcript of the interview.

Tonya Hall: Are Big Data and Analytics all about the technology, or is there something bigger? Welcome, Bill.

Bill Schmarzo: Thanks for having me.

Hall: What does Hitachi Vantara do?

Schmarzo: Think of us as the digital arm of Hitachi Limited. That is, we are providing data, analytics, and applications around IoT, not only for Hitachi itself, but also for all of our customers.

Hall: You've written two books. One is Big Dataand Big Data MBA. In addition to Big Data, tell us what those books are about.

Schmarzo: The books I wrote, especially the second one, the Big Data MBA, is actually the textbook I use for the class I teach at University of San Francisco, which is called the "Big Data MBA." The premise behind that class is that, tomorrow's business leaders will need to embrace analytics as a business discipline, that the days where we could turn over analytics to somebody else in the organization, they're over. That leading organizations, especially leading organizations in the area of digital transformation, are the ones who are embracing analytics as a way to differentiate their business and to fundamentally change their business models.

Hall: How does a company become more digital?

Schmarzo: Great question. It doesn't start with technology. It starts with companies really understanding, amongst their customers in the market: What are the sources of value creation? Where is value created in the marketplace? And then, how do I leverage digital assets, particular data, analytics, and intelligent applications, to basically capture and monetize those high value situations?It really starts with a very intimate knowledge of your customers, what it is they're trying to accomplish, where the points of value. And then, a very thorough understanding of your own internal value capture processes, so you understand how I'm going to capture those points of customer value creation. Sounds kind of complicated, doesn't it?

Hall: Well, that's the question, then, I guess: How do you decide what data you should collect? I mean, how do you assign value to the data?

SEE: Information security policy (Tech Pro Research)

Schmarzo: Great question. We actually did a research project at USF on determining economic value of data. What we've found is that, the key linkage point for identifying the points of value creation are the decisions your customers are trying to make in their journey map. If you think about somebody who's trying to buy insurance, for example, there's an epiphany moment where the customer realizes, "I need to have insurance." As in, if I was an insurance provider, I would want to make sure that I'm there at that epiphany moment, to help set the agenda, and then the customer's going to go through a variety of different phases and decisions and processes to understand, "What insurance should I buy? What kind I need? What price am I going to pay? What am I going to cover?" The whole litany of insurance, not only when you buy it, but through your entire life of that policy up until the point in time where you expire, either you or the policy expires.

And so, organizations need to understand the journey that taking place. Where are the points of value creation? Where are the inhibitors of value capture? Then, they can build the kind of internal applications and processes to help capture the points of value and mitigate those points, those inhibitors.

Hall: When digital transformation goes wrong or is unsuccessful, what is often the cause?

Schmarzo: It's really that organizations don't understand what customers are trying to do. They haven't taken a time to really focus in on the decisions that customers are trying to make. One of the key processes that we teach in my class at the University of San Francisco is design thinking, in particular customer journey maps. If you understand what process a customer's going through. One of the biggest challenges organizations have is they think the customer journey starts when they first contact the customer and ends when the customer leaves. But think about a customer, for example, who's trying to plan a vacation. Think about all the different companies they're going to touch along the way: the resorts, the airline, the car rent, the food and restaurants, and all kinds of different organizations they touch along that path. If you're only thinking about it from the myopic view of what you provide, you are leaving of tons of value on the table. You're leaving all kinds of money on the table, and in particular, you might be missing some very close aspects of that customer journey that you could actually monetize yourself.

SEE: Network security policy (Tech Pro Research)

Hall: How do you tie data back to use cases, for example? Amazon is an expert at this. I mean, can you give an example of using data for customer retention?

Schmarzo: Yeah, Amazon is the master. What Amazon has done is, for every one of their customers, they've built these very detailed analytic profiles. Right? So they know, for every customer, what products you're likely to buy, when you're likely to buy it, what you're going to buy in combination. They know all about you. They've taken all that detailed purchase and engagement data, and they've used it to make predictive models about what you're likely to do. That is really the starting point: If I know what customers are likely to do, then I can get there first. If I think you're going to need to buy insurance, I'd get there first. If I think you're planning on a vacation, I can get there first. And so, the way you drive that digital transformation process is, you understand the decisions your customers are trying to make. But then you'd build the analytics that you'd need to support that. Think predictive analytics to predict what they're likely to do, and prescriptive analytics to prescribe recommendations. I mean, Amazon's the master of recommendations. So is Netflix, and Spotify out there.

Once you identify the decisions you're trying to make and the predictions you're trying to make, that'll tell you what data you need to have. There's a whole process for going through a process of bringing the business stakeholders into that, so that you can brainstorm what data you might need. Because what you're going to find is, while you've got some of the data internally that's quite important, there may be other datas external that can be quite valuable in trying to predict what customers are likely to buy.

Hall: You say that Big Data is small. Explain that.

Schmarzo: Well, that's exactly the point, that these analytic individual profiles. Amazon is successful, Netflix is successful, American Express is successful because they know so much about you. They have a deep history of relationships with you. They know what you're likely to do, because they have a history of what you've done previously. And so, it's not about big, it's about getting down to individuals. It's about Tonya. It's about understanding what you want, so that I can better serve you. Right, so I can not only understand what you might be buying from me, but I might be able to identify new monetization opportunities, that is, un-met or under-met needs that you have that no one else is meeting that I could uniquely meet. So it's really about the individual, the power of one, the power of the you, the power of me, and learning as much as possible about that.

Let me extend that one more point. Humans have distinct behaviors, inclinations and propensities and tendencies and such that Amazon and leading digital transformation companies are monetizing. IoT devices have the same behaviors. Over time, devices have behaviors. They have tendencies and inclinations and propensities. The successful IoT companies are the ones who are going to also build these very detailed analytic profiles. You'll hear them called digital twins, for example. Right? So that I know exactly when I think that product might break down, when I might need to replace a certain widget, who's going to do the work for, when I'm going to end-of-life the product. So think about all the decisions you're making about that particular device, and how I can build very detailed profiles and actually keep them in this digital twin repository that helps me to make better decisions.

SEE: Security awareness and training policy (Tech Pro Research)

Hall: You mentioned collecting data over and over again. That's great, and understanding your customers are great. Facebook's been in the news for collecting data. I guess my question is: How does privacy and security factor in? Are we going to trust companies?

Schmarzo: Ooh, that's a good question. Overall, when organizations are building out, in their digital transformation process, and trying to understand the kind of decisions their customers are trying to make, they need to employ something that we call decision governance. That is, you're going to learn things about people, for example, that you may not want to act on. If you haven't gone through in the hypothesis development process to not only identify how you're going to use the data, but also how you're not going to use the data, then you open yourself up to all kinds of privacy and litigation issues. Think about it, as, in statistics, it's the cost of false-positives and false-negatives. Right? It's easy when you know something, that you have a good idea about it. But what if you're wrong, right? What if you're wrong? Understanding the cost of false positive and false negatives. Grab your stats books, type I and type II errors, here they come again. But they're really important, as you think about, How do I make certain that I'm serving my customers without basically trading in my goodwill I'm building with them?

Hall: What are the latest developments in using machine learning to clean up data?

Schmarzo: That's a really interesting area. Because there's, in the same way that we're using machine learning to identify patterns and relationships between customers, we can use that same technology to identify patterns and relationships buried in the data. It'll help us to find patterns and relationships that are broken, right? That'll give us a hint as far as what we need to go fix and go after. Because machine learning is really, whether it's supervised machine learning or unsupervised machine learning, it's all about identifying patterns and relationships that are buried in the data. Again, we use that for humans, we use it for machines and devices. We can actually use it for our own data as well, to help us make sure we understand how clean our data is, and where we need to invest to clean that data up.

Hall: In cleaning up this data, I mean, what kind of ROI is resulting? What kind of artificial intelligence are we getting, better artificial intelligence, I should say?

Schmarzo: Well, you asked two questions there. ROI question is really interesting, because the ROI in cleaning data is only there if you know what use cases you're going after, and the business value of those use cases. If you're going through the process of doing your ROI in the use case level, then the data you need to support that, you can clean that data. You don't want to clean all the data. Not all data's of equal value. Not all data's equally important. Having that use case focus allows you to say, These are the data sources that are most important. I'm going to spend my data cleansing, my metadata management, my data lineage. I might shrink the latency of the data, I might prove the.. all the things you do to invest in data as an economic asset stems by understanding the use cases you're going after and what data is most important in solving those use cases.

SEE: Password management policy (Tech Pro Research)

Hall: That's great information, Bill. Is there anything else that you'd like to add?

Schmarzo: Yeah. The one thing I would, we tend to think about big data and data science and machine learning and artificial intelligence as a technology conversation. I'm going to change the frame entirely here. It's an economics conversation. It's not a technology conversation. Technology is critical, right? We need to understand that there are 27 different types of neural network capabilities out there. But in the end, it's an economic conversation. What am I doing, from an organizational perspective, to identify and capture those sources of value and wealth creation? That's economics.

Hall: Well, thanks again for some insight on digital transformation and the machine learning, cleaning up our data, and better analytics and IoT and all of that. I really appreciate it. If somebody wants to, if they want to connect with you, or maybe they want to find out more about the Cubs, I mean, how are they ... doing, Bill? How are the Cubs-

Schmarzo: Yeah-

Hall: Doing?

Schmarzo: They just dropped a three-game series to the Reds. Are you kidding me? Like the worst team in baseball. I'm not happy about that. But if you want to connect with me, the best place is Twitter. It's @Schmarzo. You also can reach me on LinkedIn. Those are the two best places I hang. I would say Snapchat, but I haven't figured it out. My daughter continues to make fun of me because I haven't figured out Snapchat. I'll just stay with Twitter and LinkedIn.

Hall: Thanks again, Bill, for the insight on Big Data and analytics. If you want to find me, you can. You can find more of my interviews right here on TechRepublic or at ZDNet, or maybe you find me on social media. I'm on Twitter, Facebook, LinkedIn, and you can find those by going to my website, In fact, if you'd like to chat, or if you have comments, I hope that you'll find me on Twitter. That's @TonyaHallRadio. I'd love to hear from you. Thanks for watching.

Also see


About Tonya Hall

Tonya Hall is a pioneer in new media broadcasting. She hosted a daily broadcast radio show on social media, worked as a producer for one of the largest tech podcasting networks, and continues to bring compelling guests and stories to tech fans across...

Editor's Picks

Free Newsletters, In your Inbox