TechRepublic talked to Uber's head of machine learning about what the ride-sharing giant has learned from seven years of collecting and using 'smart' data.
A year ago, Danny Lange took over as the head of machine learning at Uber. The ride-sharing company, which launched in 2009, is, essentially, a tech company: It operates entirely through an app. Lange manages a team in San Francisco, and Uber has a smaller team in Seattle. And machine learning has become the underlying foundation for every part of the company.
"Machine learning and AI technologies can really solve some very fundamental business problems that are really hard to create hardwired solutions to," said Lange. Here's TechRepublic's conversation with Lange about how Uber uses machine learning to operate.
How is Uber's approach to machine learning unique?
The traditional approach has been to hire some PhDs, data scientists—everybody's on their own, team by team. We wanted to create machine learning as an internal service instead. It has a graphical use interface; it has APIs. Every team in the company can use this service, just like they can use data services and computing services. Machine learning is just another tool in the toolbox for the profile teams, for the software engineers and the data scientists.
Machine learning has been in our DNA from the beginning. The nature of Uber is this idea of a two-sided marketplace where you have drivers on one side and riders on the other side. The essence of creating an efficient marketplace is really having a lot of these dynamic properties that benefit from machinery.
Machine learning has been there from the beginning, but what we're doing is to take it a notch up—to make sure that it's not just in one part, say, in marketplace management, but that it's in every part of the company. We have found over time that machine learning does add value in areas where people initially didn't think of machine learning as an option.
How does machine learning work in something like Uber Eats?
At the core of the user experience in a meal service is the time to delivery. You want to know approximately how long it will take for someone to deliver this meal to your doorstep or office.
Initially, that was basically thought about as a classical computation. The distance between you and the restaurant, and the average speed in your town, and then some average time to prepare the meal. That's the classical thinking. But we actually now have the data about how long it takes to make noodles, how long it takes to make a hamburger, and how long it takes to deliver it in different parts of town at different times of day. You can start building machine learning models that can give you a more accurate prediction based on the data, not on some finite computation.
SEE: Machine learning: The smart person's guide (TechRepublic)
When we ruled that out, we got an overnight improvement in accuracy of 26%. There's a very low friction, very low barrier for the team to say, "Hey, let's deploy more models here." If we know when the restaurant actually started the meal, we have more information. We can actually have a machine learning model that refines your estimated time of delivery even more. You can see how an application, in a short time span, goes from being a hardwired application to becoming a smart and dynamic application that benefits from knowing your behavior and from knowing other people's behavior.
What other ways can machine learning improve efficiency?
An important goal for us is to give you as accurate an ETA as possible. When you request a car and we tell you it's going to be 14 minutes or 12 minutes before it shows up, we want to make sure that that estimate is as precise as it can be. We gather information from millions of trips, because we know exactly how long it took for the car to come to you for each trip. We basically use data to build models that estimate the time it will typically take for the car to reach you at any given time of the day, any given time of the week. That is better than any attempt to compute the route and say, "It's going to take the car seven minutes to get to you." It learns from the experience.
We started improving the pickup experience by sometimes asking you to go to a nearby corner. You may order the car in a place where there are restrictions, or maybe it's really hard to stop there, but just 10 yards down is a much better pickup.
SEE: How Google is getting smarter with artificial intelligence (CBS News)
It's not like there's someone at Uber saying "We want to command you to get picked up on the corner." We basically gather the data from the pickup experience. We know when the driver is at the destination, we know when the driver clicks that the trip has started, things like that. And then the machine learning system will basically determine where people have low friction pickups, and learn from that.
It's a pretty remarkable thing, because you're talking about millions of pickup places across the country. Here you have to use machine learning to improve a very, very personal experience of walking 10 yards, having a perfect pickup.
What are the biggest challenges?
The biggest challenge is a very positive challenge, which is the opportunity to use machine learning to improve Uber's customer experiences. Also, being smart around force detection, basically detecting fraudulent behavior as it happens, so that we don't accept rides with a stolen credit card, as an example.
SEE: Machine automation policy guidelines template (Tech Pro Research)
Then there's the whole task of improving maps. We want to make sure that when you request to be picked up in front of your house that we stop in front of your driveway and not the neighbor's driveway. This is a completely different machine learning problem, identifying where that stop off is exactly located, where the front door is. Then it goes onto the challenge of self-driving vehicles, which is also machine learning-based. There's no limit to the use of machine learning within Uber.
Does Uber use its own mapping system?
A lot of existing maps and map services are really good, but there is certain information on those maps that's not important to us, and then there's other information kind of missing from those maps—where they will say, "You are at your destination," but that's within a block of your destination. We need to know more details, so we basically need to enhance the maps for our purpose.
What have been the biggest surprises?
One of the surprises is basically a positive one, which is amongst the engineers at Uber, there's a very strong desire to use this kind of technology to improve apps and services. There's a lot of open-mindedness looking at the business challenges and then jumping on board and using a technology that was essentially almost unknown five years ago. I think that's pretty impressive, to see that in such a large engineering organization. As I gave the example with improving the pickup spots, I think it's really incredible that you can use a piece of technology at a scale that no human can do, you know? You can not have a human sitting and recommending where to get picked up at every corner of the world. Using technology to improve the everyday life of individuals, that's kind of like a goosebumps feeling.
You've been collecting data for all these years. Is there a tipping point where all of a sudden you realize, OK now, three years in or with X amount of data, all of a sudden you've gotten to a high degree of accuracy? How much data do you really need to get very good at that?
There's all this stuff about big data, and that's not what we're talking about here. You really have to look at smart data and clean data. You don't need to go back years and years and years to do this. We talk about decay of data, so basically the data this month is much better than the data from the previous month, but some data you have to look at over a longer time span to get seasonal understanding.
What is really unique to machine learning is some of the speed by which you can analyze and improve your accuracy. Every time you finish a trip, right there we get report. We're able to look at this in real-time and go back and actually improve the predictions based on real-time traffic conditions. All this about big data is so overhyped; it's really about being smart about what you're looking at.
- Amazon open sources its deep learning software (ZDNet)
- How Amazon wants to bridge the data science gap by bringing machine learning to the cloud (TechRepublic)
- Machine learning face-off: Microsoft uses Band to show what its Watson rival is capable of (ZDNet)
- Google DeepMind: The smart person's guide (TechRepublic)
- How developers can take advantage of machine learning on Google Cloud Platform (TechRepublic)