Autonomous driving research and the race to develop driverless vehicles has, at its core, a key: Big data. These technological advances, which are dependent on the machine learning branch of AI, rely on the data collected by car companies–data from real miles driven, such as with Tesla’s Autopilot, data from simulations of autonomous driving, and data from test situations, such as Uber’s driverless fleet in Pittsburgh.

Big data, said Michael Cavaretta, director of analytics infrastructure at Ford Motor Company, means data that is “too big to easily handle within your computational resources.” It’s about looking at datasets with “high velocity, high volume and high variety,” he said.

And as computers have become more powerful and storage is cheaper, grappling with this data has become more difficult.

But this data is essential to machine learning–which operates through inputting data and learning via feedback loops. “Data and machine learning go together like peanut butter and jelly,” said Cavaretta. “So much better with each other.”

Cavaretta previously led an analytics group within product development, specifically in research and advanced engineering, supporting different functions within Ford. There were several of these groups, such as an analytics group in manufacturing, one in marketing and sales, and so on. “We would do our best to be internal consultants,” he said, “and work with our internal customers to deliver the best value.”

There has been a sea change recently, he said, at Ford. When the company brought on a new chief data and analytics officer, Paul Ballew, the aim was to centralize Ford’s analytics groups. The new data operations organization has a singular focus on understanding Ford’s internal data, third-party data, potential partnerships, and vehicle data, said Cavaretta. It’s about “having an enterprise view and an enterprise strategy, with regard to Ford’s data, and then the analytics to put on top of it,” he said.

The group realized there was an opportunity to be part of a bigger picture. “We thought, ‘It would be great if we had more communication and looked beyond the immediate needs within a particular silo,'” said Cavaretta. The group presented the executive board with a proposal for a new role–and that’s when Ballew was hired, and the Global Data Insight and Analytics group was formed, with a machine learning division.

When Ballew came in, he saw the importance of having an enterprise view of both the data and analytics side, said Cavaretta. So Ford has focused on making sure the right roles are filled: “For data engineers, data scientists, and people who can understand both sides,” said Cavaretta. “Now, we’re here.”

SEE: Ford data scientist knows how to make business and IT talk (TechRepublic)

Cavaretta transitioned from “being a real data scientist, a person with mostly hands on keyboard leading a group of data scientists, to being in charge of our analytics infrastructure and our data supply chain.”

“Data supply chain,” to clarify, is Ford’s branded term for a data lake. “What we’re looking at is how do we pull together, in a very strategic way, our internal assets, as well as third-party data sources, connected vehicle data, autonomous vehicle data?” said Cavaretta. “How do we put that all in one repository in a way that allows it to be able to operate it on efficiently and effectively?”

One area with a large amount of data, in particular, is autonomous vehicles.

“We provide the platform to allow them to be able to store massive quantities of data, tag it, search to find the right things to make sure that they can efficiently find where they want to log something or where they want to check out where the algorithms operate,” said Cavaretta. “That allows them to modify the technology to further development.

So how, exactly, does Ford solve problems with data?

One way is in manufacturing. With parts delivered, data can show the routes that are being taken, and the efficiency of the journey, to help save time and money. Another area is mobility. “Big Data Drive looked at some of the connected vehicle data that’s generated from electric vehicles,” said Cavaretta. “There’s a huge volume of the data coming in, and you need to be able to have the newer technologies to handle that.”

SEE: Job description: Big data modeler (Tech Pro Research)

When it comes to data, Cavaretta said Ford differentiates itself from other big automakers by “pushing for an enterprise-level approach.” Ford’s long-term goal is really to be able to get a solid strategy around its entire suite of data assets, Cavaretta added. “When we want to be able to understand a particular domain, we’d really like to be able to go in and say, ‘How do we get a complete view of that particular domain, whether that be customers or parts or vehicles? How do we put all of that together?'”

TechRepublic asked Cavaretta how Ford goes about hiring individuals to work on its data. “It’s tough to hire in this area,” Cavaretta said, “but we really believe we’ve got a really good value proposition to people who come into the organization because this is important work. The company has really recognized that. Looking more the organizational aspects of it rather than just the, ‘We do cool stuff with data and algorithms.'”

“We know AI is out on the horizon, so we want to make sure we’re in front of that,” said Cavaretta. “That’s what you get when you get a large analytic organization.”

Also see…