From February to May 2020, the number of scientific papers published on COVID-19 skyrocketed from 29,000 to more than 138,000, according to Salesforce. As people around the world step up to help, the number will continue to grow exponentially, with projections to swell to more than one million by the end of 2020.
SEE: Coronavirus: What business pros need to know (TechRepublic)
The company believes scientists and researchers on the frontline of the pandemic should not have to spend their time digging through thousands of pages of COVID-19 research. So on Wednesday, Salesforce Research introduced COVID-19 Search, an artificial intelligence (AI)-powered search engine to equip scientists and researchers with the most relevant COVID-19 research. It is designed to help users sort through the clutter to make complicated research information easier to find.
The tool combines neural semantic search AI and traditional syntactic search AI to help scientists, researchers and others be more efficient with their research by providing a more efficient way to find and filter out information, Salesforce Research said.
“Searching scientific publications requires different techniques from traditional keyword-matching search engines,” wrote Salesforce researchers in a blog post. “It’s critical that a COVID-19 search engine interpret the proper meaning in a given search, going beyond finding results based on the frequency with which words appear in documents. And with long documents,
it’s valuable to quickly surface relevant passages in search results.”
SEE: Highlights: Salesforce TrailheaDX 2020 (free PDF) (TechRepublic)
COVID-19 Search addresses this by combining text retrieval and natural language processing (NLP)—including semantic search, state-of-the-art question answering, and abstractive summarization—to better understand the question and surface the most relevant scientific results, the researchers said.
The order of words in a single scientific search are very specific, and a slight change in that order can have a drastically different meaning, the company said. “For example, searching for ‘What expression pathways does SARS-CoV-2 induce?’ is substantially different from ‘What is the expression pathway of SARS-CoV-2?'”
The results need to align with the context of the query, the company said.
“So we combined information retrieval (IR) search with our strengths in NLP to emphasize semantic search that models the meaning behind the query.”
To train the search engine, the Salesforce researchers said they split scientific publications into pairs of paragraphs and citations that could be used to train algorithms to determine if the title of a citation was referenced by a paragraph. The same AI can be used to take a query and find paragraphs in a document set that address it.
SEE: Life after lockdown: Your office job will never be the same–here’s what to expect (cover story PDF) (TechRepublic)
“Semantic search combs through the massive population of documents and returns a subset, maybe 100 or 1,000,” the researchers wrote. “We run these documents through a question-answering AI that treats the user’s query as a question and does its best to generate an answer from the retrieved documents.”
If an answer is contained in any single document, the company said, COVID-19 Search can re-rank the document list to surface the document.
With the threat of a second wave of infections looming, there is a new sense of urgency for ways to help mitigate and cure COVID-19, the researchers said. “Humanity needs cures, vaccines, and solutions. COVID-19 Search can empower scientists on the front lines to accomplish those tasks faster.”