Big data graphs are playing an important role in the coronavirus pandemic

Researchers from many outlets are turning to graphs in an effort to connect pieces of data related to COVID-19 and find a treatment.

Big data graphs are playing an important role in the coronavirus pandemic

TechRepublic's Karen Roby talked with Alicia Frame, lead product manager for data science for Neo4j, a graph database platform, about the role of big data and graphs in COVID-19 research. The following is an edited transcript of their conversation.

SEE: Coronavirus: Critical IT policies and tools every business needs (TechRepublic Premium)

Alicia Frame: Neo4j is a graph database, and what that means is we store data as a graph. You can think of it like nodes are the nouns, they're things, and then they're connected to each other with relationships. What that means is that we're really good at storing disparate data from different sources, data where relationships are important. You can think of it like in a social network: I know this other person, and those are two nodes, and there's a relationship of nodes. Or in the pharmaceutical sector, you can think of it like genes, chemicals, and diseases. I have data from different sources that characterize the genes associated with different diseases, what drugs are there to treat those diseases. There's publications that are pulled into this graph around what genes are expressed in patients who have been exposed to a certain virus in this case.

A graph database is really good at representing these really complex concepts. What happened with COVID-19 is back in March, we started to get a lot of requests. We have a really active developer community, and we spend a lot of time talking to our users, what are they interested in? What do they need? And we started getting requests for, "Hey, I have a project where I want to build a knowledge graph about COVID-19. Do you guys have any resources to help us out?" Or, "Hey, I want to build a contact tracing app to figure out who's been interacting with someone with a disease. Can you guys get me set up?" We have a cloud platform called Aura and, "Can I have some credits to run on Aura?" And we kept hearing a couple of requests and then more and more, and we decided it was really important to come to the forefront and respond to that.

Starting at the end of March, we had this concerted effort to say, "Look, if you're doing something with COVID-19, we want to help. We had a Graphs4Good hackathon. We have devoted internal engineering resources, as well as our field team to help people get up and running, figure out how to do the engineering work, to answer their questions, as well as we did a big Graphs4Good to try to get everybody together in one place and build off each other's ideas.

SEE: COVID-19: A guide and checklist for restarting your business (TechRepublic Premium)

Karen Roby: The collaboration is so important. Talk a little bit about how this type of graph helps pharmaceutical companies and others trying to understand and track this virus.

Alicia Frame: If you're looking at drug discovery or curing diseases, in the past, what you would do is you would read a lot of papers. There's hundreds of papers, thousands of papers being published every day with information about, "I studied this gene and it did this." That's really hard to consume. If you're one drug discovery scientist, it's basically impossible to keep on top of the literature. If you think about COVID-19 in particular, the pace of publication is crazy. What they're doing with Neo4j is we have folks who are using an NLP to basically parse those papers and extract the key concepts. What are the genes mentioned in this paper? What are the diseases that they're talking about? What are the symptoms, what are the molecular pathways involved? And then that data is fed into a graph.

SEE: Big data's role in COVID-19 (TechRepublic)

You have all of these different papers, you're extracting the key concepts, and then they're all being knitted together in a graph so that you can really rapidly start exploring and traversing. You might want to say with COVID-19, "We know the spike protein is important. So I want to find all of the antibody studies that have mentioned interactions with the spike protein." And now I don't have to read 30 papers. What I can do instead is I can write a query with Cypher, which is our query language, to answer that question really rapidly. It's basically, you have all of this complicated interconnected data right at your fingertips.

Karen Roby: I know you guys work with many different partners and companies that are involved in this. If you could just maybe talk about one in particular. Just mention the partnership there and what the role is.

Alicia Frame: One really cool example, that's close to my heart is tellic, which is a company based in New York City. They work for pharmaceutical companies to do that kind of text processing that I talked about: Parsing papers, internal data patents, and putting that into a graph with a UI so that users can get that information at their fingertips. Neo4j has a database. They build tooling to populate the database and make it easy to use the database. And so they've built out tellic graph.C19, which is a public graph of data that they've parsed from publications and patents that normally they sell that data to pharmaceutical companies. But instead, because we're all working on the same thing together, they've actually made it freely accessible to researchers, drug companies, anyone interested in using the data. They're working with the Empire Institute in New York state to help with answering questions like, a drug like Remdesivir: How does it act and how likely is it to actually help with COVID-19?

Karen Roby: Have you seen a real difference in the level of how people want to be involved and their desire to really make a difference?

Alicia Frame: Definitely. Drug discovery is historically something where companies want to keep their secrets secret, right? If I'm going to discover a new drug, I want to keep that gene secret until I can patent it. This is different. Folks are open to collaboration because we need to respond as quickly as possible. It was really amazing to see companies that would usually be each other's competitors coming together to work on graph projects, or seeing people volunteer their time. I worked with some clinicians who outside of their work hours were spending their time doing engineering on these projects just because they wanted to help. It was really nice to see people who don't normally work together, coming together to open source information, solve problems collaboratively instead of everybody hiding off in their own silo. And I think that's helped a lot with the research velocity.

Also see


TechRepublic's Karen Roby talked with Alicia Frame, lead product manager for data science for Neo4j, about the role of big data and graphs in COVID-19 research.

Image: Mackenzie Burke