Data Science at Uber

Jagandeep Singh
4 min readAug 11, 2020

Uber is one of the most successful startups of all time. Uber gives about 1 million rides per day and 14,000 rides per minute. At first, Uber started as an online taxi calling service then later Uber entered in Food Delivery, Trucking, and Self-Driving taxi.

Uber has its own Machine Learning platform called Michelangelo which is used to create different models for Uber’s various services.

Michelangelo

Michelangelo is an internal ML-as-a-service platform that democratizes machine learning and makes scaling AI to meet the needs of the business as easy as requesting a ride. Michelangelo enables internal teams to seamlessly build, deploy, and operate machine learning solutions at Uber’s scale. It is designed to cover the end-to-end ML workflow: manage data, train, evaluate, and deploy models, make predictions, and monitor predictions. The system also supports traditional ML models, time series forecasting, and deep learning.

The workflow of a machine learning project. Defining a problem, prototyping a solution, productionizing the solution, and measuring the impact of the solution is the core workflow. The loops throughout the workflow represent the many iterations of feedback gathering needed to perfect the solution and complete the project

Forecasting

Uber uses a variety of spatiotemporal forecasting models that are able to predict where rider demand and driver-partner availability will be at various places and times in the future. Based on forecasted imbalances between supply and demand, Uber systems can encourage driver-partners ahead of time to go where there will be the greatest opportunity for rides.

Some Machine Learning models they use for demand prediction are:

  • Recurrent neural networks (RNN)
  • Quantile regression forest (QRF)
  • Gradient boosting trees (GBM)
  • Support vector regression (SVR)
  • Gaussian Process regression (GP)
Uber Driver app’s demand heatmap

Estimated Times of Arrival (ETAs)

Uber uses Machine Learning to Estimate times of Arrival for Drivers and for the whole trip. While requesting a uber ride or uber Eats the first thing that users see is ETA. Accurate ETAs are important for positive user experience. But ETAs are most difficult to get right since it involves many factors like traffic, time of the day, weather, etc.

Uber passenger app showing ETA

Uber’s Map Services team developed a sophisticated segment-by-segment routing system that is used to calculate base ETA values. These base ETAs have consistent patterns of errors. The Map Services team discovered that they could use a machine learning model to predict these errors and then use the predicted error to make a correction.

One-Click Chat

Uber uses a one-click chat feature with communication between riders and drivers. They use Natural Language Processing Models that predict and display the most likely replies to in-app chat messages. The app shows these responses as buttons so that drivers don’t have to type while driving. Letting the drivers respond to rider messages with a single button reduces distraction.

One-click chat in Driver app

Uber Eats

Uber Eats uses a number of machine learning models to make predictions that optimize the user experience each time the app is opened. They use an ML-powered ranking model to suggest restaurants and menu items based on historical data and information from the user’s current session in the app. Uber uses Machine Learning models to estimate meal arrival times.

Self-Driving Cars

Uber’s self-driving car systems use deep learning models for a variety of functions, including object detection and motion planning. The modelers use Michelangelo’s Horovod for efficient distributed training of large models across a large number of GPU machines.

Self-Driving Car Anatomy

Data Science Tools

Python is the go-to data science programming language at Uber and is extensively used by the Uber data team. Commonly-used third party modules to do data science at Uber include NumPy, SciPy, Matplotlib, and Pandas. Uber data team does use R programming language, Octave or Matlab occasionally for prototypes or one-off data science projects and not for production stack. D3 is the most preferred data visualization tool at Uber and Postgres, the most preferred SQL framework.

To Conclude

Uber collects data on a million rides each day. The more data they collect, the better their models get. Uber always managed to keep moving forward with innovative ideas like self-driving cars, hot-air balloon transportation, food delivery services, Uber boats, and even a helicopter service called the UberCHOPPER. Uber even has plans of rolling out flying taxis. The company seems to be quite serious about its future project called “Elevate,” which aims to bring aviation to everyday people.

Sources

  1. https://eng.uber.com/scaling-michelangelo/
  2. https://eng.uber.com/michelangelo-machine-learning-platform/
  3. https://medium.com/@jrodthoughts/some-things-uber-learned-from-running-machine-learning-at-scale-70dccdfb944d
  4. http://techgenix.com/uber-machine-learning/
  5. https://www.dezyre.com/article/how-uber-uses-data-science-to-reinvent-transportation/290

--

--