Big Data

How an algorithm detected the Ebola outbreak a week early, and what it could do next

HealthMap is a data mapping tool that detects and tracks diseases across the world. The system discovered the Ebola virus outbreak nine days early.

screen-shot-2014-08-25-at-9-35-13-am.png
 Image: HealthMap

The Ebola outbreak has spread incredibly fast. It's a dangerous disease, not only because it is highly contagious and fatal, but it is also riddled with stigmas and myths, particularly in the African communities where it is proliferating.

An algorithm on HealthMap, an international mapping tool that detects and tracks diseases, found Ebola, called a "mystery hemorrhagic fever," just over a week before it spread, though the founders didn't initially realize the importance of what they discovered.

Eight people reported the mysterious disease in March. It caught the HealthMap team's attention, but they didn't call out a massive issue until it was confirmed as Ebola on March 22.

"That was concerning, but still oftentimes these events tend to be smaller in nature, and this is much more substantial than what we've seen in the past as far as the time it's [taken to] spread. It's unusual as far as what we've seen," said John Brownstein, co-founder of HealthMap and an epidemiologist.

The HealthMap team realized the situation was quite serious, so they created a dedicated HealthMap visualization: healthmap.org/ebola. The tool shows the full timeline of events, including locations, case counts (suspected, confirmed, deaths), and original source documentation.

There are few other places to find comprehensive, easy-to-understand details on the full scope and progression of the Ebola outbreak. Traffic to the HealthMap site has skyrocketed as citizens and professionals seek out clear information, said Clark Freifeld, co-founder of HealthMap.

Ebola hemorrhagic fever is a virus that first appeared in Africa in 1976. It is an extremely infectious disease, passed through bodily fluids. Typically, symptoms like fever, rash, vomiting, diarrhea, and stomach pain appear after eight to 10 days. There's no specific treatment or vaccine, and the fatality rate can be as high as 90%.

According to the World Health Organization, as of August 22, these are the stats of the Ebola outbreak in Africa, which emerged in March 2014.

  • Guinea: 607 cases, 406 deaths
  • Liberia: 1,082 cases, 624 deaths
  • Nigeria: 16 cases, 5 deaths
  • Sierra Leone: 910 cases, 392 deaths

There are also reported cases in Democratic Republic of the Congo, Europe, and in the US, though none have been confirmed. Two patients who contracted the disease in Africa were brought to the US for experimental treatment, but not much else has been done for the outbreak.

"The situation was something we were keeping close tabs on as it evolved at the time, but there was still a small number of initial cases, and Ebola outbreaks have been successfully contained in the past," said Freifeld. "The real concern arose when cases kept increasing this summer, especially given the high fatality rate of the strain (60%)."

Brownstein said the data they collect is a real time view of what's happening. The Ebola outbreak reminds him of what HealthMap showed when the H1N1 (swine flu) virus spread -- they had many reports from local news media early on, which were eventually confirmed. So this instance is not unfamiliar.

"[This instance with] Ebola is some of the most usage we've ever had at the site, and it's raising awareness for infectious disease," Brownstein said.

In addition to the structured information they capture about locations and case counts, HealthMap also captures narratives of case situations, such as patients leaving their containment areas, civil unrest, and other impacts of the outbreak.

"These factors are key for fully understanding the situation and how to address it," Freifeld said.

The web crawler on HealthMap collects information from hundreds of thousands of sources across the internet, based mostly on keyword searches for disease-related terms. Of course, it filters out things such as "outbreak of home runs" or "Justin Bieber fever." It sifts through RSS feeds and APIs and analyzes the text and geographic location of the information and removes the noise. The automated process is repeated every hour, 24 hours a day.

The algorithm is 90% accurate at filtering out the junk, Freifeld said, and it is continually improving through feedback from analysts. This is a machine-learning algorithm. It looks for example articles that have been categorized by HealthMap analysts and searches for patterns of how new articles match up with the history of relevant and irrelevant content.

The value of HealthMap, Brownstein said, was initially for disease surveillance, but it has moved onto something better.

"Getting case information and data of very specific epidemiology, of outbreaks, and fit that into disease models, where it might spread to, how bad it might be, then integrate it with climate data or transportation data, all of a sudden it's data you wouldn't necessarily have access to," Brownstein said.

And that, he added, is critical to furthering research.

"For me personally, it combines two passions of mine, computer science and public health. Our work gives us a chance to apply advanced artificial intelligence algorithms to problems that matter in global health," Freifeld said.

Freifeld had the idea for HealthMap as a way of organizing the large amounts of outbreak data that, he said, at the time were widely available on the internet but scattered and difficult to use.

Brownstein's background is in epidemiology, specifically infectious disease mapping, and he especially focuses on emerging diseases and how to use technology to map and track them.

The two started HealthMap in 2006 as a side project on nights and weekends outside of work. The website, which is a project out of the Boston Children's Hospital, then rapidly became a full-time project as it started to take off in fall 2006.

Since its conception, HealthMap has evolved in three main ways. First, it is continually incorporating more and more sources of data. It has expanded into 15 languages and uses social media sources, looking at drivers of disease such as attitudes towards vaccines and animal health. Second, we have greatly expanded how we categorize outbreaks. Disease and location are obviously key, but we are now looking at setting (e.g., hospital, military base), nature of the disease (e.g., antibiotic resistance), case counts, and location mapping. And third, they are consistently engaging more with users. HealthMap accepts reports from anyone via the web or mobile devices.

"We are seeing the role of the public as 'citizen epidemiologists' rather than just passive consumers of information," Freifeld said.

Part of the importance of HealthMap is its usefulness to epidemiologists and other experts, to engage them and have them report back to the team, but it's also to engage communities around the world. HealthMap has projects focused on that concept as well, Brownstein said.

"We're trying to build better tools to provide data and have them return information to fill out the map," he said.

The HealthMap team is continually expanding the list of data sources they tap into, including doing research on tapping into data from Wikipedia, Yelp, Twitter, mobile apps, and other internet sources, which can all provide early signals of disease activity.

"What's exciting about HealthMap is that we are capturing all this information and making it available and accessible," Freifeld said. "Our users include public health professionals, but actually the vast majority of users our users come from the general public. You don't need to be an expert to benefit from the service."

Also see:

About

Lyndsey Gilpin is a Staff Writer for TechRepublic. She covers sustainability, tech leadership, 3D printing, and social entrepreneurship. She's co-author of the upcoming book, Follow the Geeks.

4 comments
Slartibartfass
Slartibartfass

Wow. I put 100 predictions in envelopes ... one hit. Hello here comes the new Nostradamus.

Adam_12345
Adam_12345

"....it was a virus, an infection, you didn't any doctor to tell you that, and it wasn't in tv any more, it was outside your..." :).....but seriously speaking, this fact only proves one fact that can't be undermined - computers will always be more accurate than we are

Neil Postlethwaite
Neil Postlethwaite

"

An algorithm on HealthMap, an international mapping tool that detects and tracks diseases, found Ebola, called a "mystery hemorrhagic fever," just over a week before it spread, though the founders didn't initially realize the importance of what they discovered.

Eight people reported the mysterious disease in March. It caught the HealthMap team's attention, but they didn't call out a massive issue until it was confirmed as Ebola on March 22."

It's a bit of a SmartPlanet grade extrapolation and correlation tall tale is it not. Fairly self evidently, if the doctors logging the "mystery hemorrhagic fever' data had identified it as Ebola before they did on Mar 22, it would have been called out earlier. Not like HealthMap did the Ebola confirmation work, just some after the event statistics. Same with other 'big data' solutions claiming to predict various trends, behaviors or activity.


The BBC's More or Less investigation into Ebola is much more compelling.


http://www.bbc.co.uk/programmes/p0244jdt


sf_jeff
sf_jeff

@Neil Postlethwaite I sort of agree, but I would suggest that an excellent way to use something like HealthMap is when you see 8 cases of something rare and deadly then you pick up a phone and call the CDC and the WHO.  It looks like they missed the opportunity to do that this time, but would probably do that next time.  It is possible that some lives might be saved in the next epidemic by tools like this.


I kind of look at it like this.  Google should not try to take credit for the content at the sites it links to, but the good news is that it doesn't need to to be useful.

Editor's Picks