Google and Harvard test a machine learning approach to food safety

A new digital health epidemiology model that uses a data-driven approach to foodborne illnesses shows promising results.

Image: Kondor83, Getty Images/iStockphoto

The system for reporting food poisoning is surprisingly low-tech; for most public health departments in the US, if you want to report a problem at a restaurant, you can email or call in a report. For instance, New York City has a what/where/who form for making complaints.

A research team at Google and the Harvard T.H. Chan School of Public Health tested a new way to spot foodborne illnesses more quickly and accurately by using a combination of search queries and location data. "Machine-learned epidemiology: real-time detection of foodborne illness at scale" was published in npj: Digital Medicine in November 2018. The article explained how to build a data-driven model to identify restaurants that are likely to have health code violations.

The team built a machine-learning model called FINDER that is designed to predict foodborne illness in real time. The team from Google and the Harvard used anonymous aggregated web search and location data to figure out which restaurants have food safety violations that may be making people sick. This forward-looking method has the potential to replace the common approach that uses after-the-fact reports from individual consumers and twice a year restaurant inspections by health departments in America.

SEE: How to implement AI and machine learning (ZDNet special report) | Download the free PDF version (TechRepublic)

Spotting unsafe restaurants sooner: How FINDER works

First, FINDER looks for search queries that suggest a person has food poisoning. The model uses machine learning to identify all the ways symptoms of food poisoning are described by Google search users. The next step is to look up the restaurants those individuals visited by using anonymized location history. This data from search and location logs comes from users who have opted to share their location data.

To filter out the noise inherent in search queries, the study team describes a "privacy preserving supervised machine-learned classifier" that they used. This technique takes into account the results of the query, which results the searcher clicked on, and the content of the pages the user viewed as a result of the search.

SEE: The future of food (ZDNet/TechRepublic special feature)

To gauge the power of FINDER, the research team tested the system in Chicago and Las Vegas for about four months in each city. Each morning, city inspectors were given a list of restaurants to visit that included some identified by FINDER. Inspectors then examined the restaurants to identify health code violations. During the study, the health departments continued with their usual inspection schedules as well.

The study included four data sets:

  • All restaurant inspections not prompted by FINDER (baseline)
  • Regularly scheduled inspections (routine)
  • Inspections prompted by complaints (complaint)
  • Inspections recommended by FINDER (finder)

In Chicago, there were 5,880 inspections during the study, with 71 prompted by FINDER analysis. In Las Vegas, there were 5,038 inspections with 61 prompted by FINDER. The test for the machine learning analysis was whether it was better than standard health department protocols at spotting unsafe restaurants.

About half of the restaurants that FINDER flagged were unsafe upon inspection. In the baseline group of inspections, 25% of restaurants were unsafe. FINDER did a better job of identifying restaurants in the low-risk category than in the high-risk category.

Machines beat out humans again

The study team also compared the results of inspections recommended by FINDER to inspections prompted by customer complaints. Because many restaurant customers in Las Vegas are tourists, the number of complaints is low in that city; for that reason, this part of the analysis included complaints from Chicago only. Restaurants identified by FINDER were more likely to be given an unsafe designation than complaint restaurants. The researchers concluded that FINDER was more robust than individual complaints because the machine-learning approach aggregates information from numerous people who ate at the same restaurant.

FINDER also avoids the recall bias that can affect complaint-based reporting systems. Recall bias happens when a person does not remember previous events or experiences accurately or omits details due to the passage of time. For example, recall bias is a risk when a person gets food poisoning one week and makes a complaint at the health department the following week. Experiences the person has had since visiting a particular restaurant and the passage of time may have affected the person's memory of the restaurant in question.

SEE: Telemedicine, AI, and deep learning are revolutionizing healthcare (free PDF) (TechRepublic)

This machine-learning model is the most recent example of digital health epidemiology. Marcel Salathe, a professor at the Swiss Federal Institute of Technology, described the difference between this new approach and the traditional methods of discovering the causes of diseases in populations. Instead of a health inspector going door-to-door to ask individuals about their sources of food or water, this digital health version of disease detective work uses data generated outside the public health system. With the FINDER example, the data source is search queries instead of personal health surveys.

Fighting the "we've always done it that way" battle

It's been hard for health departments to make the shift to a data-driven approach to food safety, at least based on Chicago's experience. In 2014, people in Chicago's Department of Innovation and Technology built an algorithm similar to FINDER. It used publicly available data to predict which restaurants were most likely to violate health codes, based on the information from previously recorded violations. This approach also used social media mining and illness prediction technologies to target inspections. It worked: The algorithm found violations about 7.5 days before the normal inspection routine did.

A goal of the project was to make it easy for other health departments to adopt this method. The Chicago team posted the project code on GitHub. Initially only one other health department tested the new system. The initial hurdle--changing the standard approach to restaurant inspections--is apparently too high for widespread adoption.

The study authors from Google and Harvard said public health departments did not have enough inspectors to do a broader test of FINDER's recommendations: "the limited bandwidth provided to us by city/county health departments ... restricted the number of inspections FINDER could suggest in a given city."

The FINDER model is still in research phase and not available publicly for health departments at this point. Study authors say that data from other search engines that include location history could create similar algorithms and possibly generate comparable results.

Exploring new applications

Tomer Shekel, senior product manager at Google, said the team is working with the Harvard School of Public Health and other agencies to continue the research in this area. Shekel said that the research team is looking for other public health challenges that could be addressed with a digital epidemiology approach.

"Location data presents a rich source of information and does not have to be limited to business establishments," he said. "We can also reason spatially at the level of parks or counties."

The Harvard/Google team is considering vector-borne diseases and the impact of air quality on human health as potential study topics. Mosquitoes, ticks, triatomine bugs, sandflies, and blackflies spread vector-borne disease.

SEE: Free machine learning courses from Google, Amazon, and Microsoft: What do they offer? (Tech Pro Research)

The biggest advantage of FINDER and other digital epidemiology tools may be the ability to make the food inspectors' work more efficient. The FINDER tool can "rank the relative risk of all restaurants in a city, and thus can provide more substantial lists of problematic restaurants to cities in the future to prioritize inspections."

Budgets and staff are always in short supply at public health departments around the country--automating any part of the inspection process could help inspectors target restaurants most likely to have a problem and prevent people from getting sick in the first place.

Also see

By Veronica Combs

Veronica Combs is a senior writer at TechRepublic. For more than 10 years, she has covered technology, healthcare, and business strategy. In addition to her writing and editing expertise, she has managed small and large teams at startups and establis...