Michael Cavaretta wants the data, the whole data, and nothing but the data. Here's what he does in one of today's hottest IT jobs.
You can't give Michael Cavaretta too much data.
He's been a data scientist at Ford Motor Company for almost 16 years, and especially during the past eight years since the tenure of CEO Alan Mulally began, data - and lots of it - is Cavaretta's mantra.
Cavaretta sees his role breaking down a few ways. First, he's a data scientist. He's also the manager for his group of data scientists.
"It really boils down to getting value from data sources within the company or external to the company. I think the biggest thing is the use of a wide variety of different tools and technologies - so things from computer science, statistics, machine learning," he said.
The other part of what Cavaretta does has to do with communicating what the data means to other people in the company. He sees it as storytelling- whether it be through information graphics or any visualization that "explain[s] it in a way that allows them to understand why you're answering their question in a particular way," he said.
That ties into the last piece of Cavaretta's job — domain knowledge. Cavaretta helps bridge the gap between the data side and the business side. Having an understanding of where other folks are coming from and what their needs are makes for better used data.
"If people are talking about things coming from the finance perspective, you need to be able to talk the finance language," he said.
In the IT world, understanding business is an increasingly important skill. The meeting point of these two fields is also something that Ford placed a great emphasis on when its recently retired CEO Alan Mulally joined the company in 2006.
His moto: The data will set you free.
Cavaretta received his Ph.D. in computer science, with a concentration in artificial intelligence. Being interested in optimization problems, he did a subset in what's called evolutionary algorithms, which is a way to set up an artificial population to solve resource problems.
From there, he went to work at a consulting company called Churchill Systems that concentrated on techniques for promotional demand forecasting with high volume retailers like Sears and Kmart.
When Cavaretta joined the research and innovation laboratory at Ford in 1998, the company was bringing up a data mining team. Since then, the team and the techniques, have changed. As it stands, Cavaretta's department, which is a one of several, is about 35 people.
After Mulally came aboard in 2006, Cavaretta said it didn't take too many meetings where execs were being asked "where's your data?" for them to start turning to him and his team for help supporting their decisions, etc. "That really changed things where [Mulally] really brought in this data-centric and analytics culture into Ford," he said.
To put the data talk in more concrete terms, there's the tale of the three-blink turn signal. When Ford was bringing the Fiesta from Europe, there was discussion about whether to include a turn signal that blinks three times and then switches off automatically.
"We wanted to know if people are happy with this, but the surveys we built were a little bit mixed, they really didn't give a clear indication," Cavaretta said. Anecdotally, Ford was hearing good things, but the numbers did match up. Cavaretta and his team started taking apart how people on the internet were talking about the feature. Turned out, the negativity came from the implementation or placement of the signal, not the signal itself.
"We were able to tease apart those two pieces," Cavaretta said.
Over the years, one of the best changes Cavaretta's seen is the idea that it's acceptable to store massive amounts of data. You can't give him too much data, after all. In talking with people like Ford's internal customers, he always pushes for the full scope of the data over averages and aggregations.
"Let's go to the raw stuff," Cavaretta said, "I'm always amazed when people get it all of a sudden. I'll tell them, 'You're taking this average, if we had the raw data, there's a lot of different things we could do."
With storage capabilities being terabytes on terabytes, Cavaretta is able to do things the way he prefers. "That is really a big game changer," he said.
In his own words:
How do you unplug?
Workout. Working out is definitely my destresser. I have a family, I enjoy spending time with my kids. We're completely into Clash of Clans now, which is actually kind of sad. But really, the big thing for me to really put my mind in a different place is I like to go out of a run, or go out for a walk, or hit the gym.
Is there a job you'd like to do if you weren't a data scientist?
Heck yeah. It goes back to something I should have mentioned before, which is I'm a total foodie. Italian background, my dad was a restaurant manager. We make our own pasta for birthday parties. If I was going to be something completely different from this, then I would love to be a chef.
Do you have any social media guilty pleasures?
I actually had to delete the Reddit app from my phone because I was spending too much time. I have an account on Reddit where I get only the stuff relating to statistics and computer science, and things like that, and while I have learned some really good stuff, I do occasionally jump over to "all" and check out the things put under the "funny" subreddit. Big Data Borat sometimes has some really good stuff. I'm a fan of Brain Pickings.