The search engine was developed using Verizon's Vespa big data processing technology and comes in response to a nationwide open research data set challenge.
Verizon Media is using its Vespa search engine to help medical professionals and researchers explore COVID-19 data without having to build their own search on the backend. The search engine comes in response to an open data set announced by the White House and others last month called the COVID-19 Open Research Dataset Challenge, known as CORD-19.
The search engine works on top of CORD-19.
"The amount of research data on this disease has been increasing at a fast pace," said Rathi Murthy, Verizon Media CTO. "This is good, but also presents a challenge for researchers who need to make sense of it."
Murthy added that "Given our experience with big data at Yahoo, we thought the best way to help was to index the data set and develop a search engine that lets researchers filter and search the 45,000 plus scholarly articles using keywords and simple search terms."
The engine matches and ranks documents for search terms, while also recommending other articles relevant to the researcher's area of interest, he said. The articles are being made available for searching via Vespa Cloud. Researchers can get started with a few of the sample queries or for more advanced queries, they can visit this page. The documents are updated weekly as new research is published.
SEE: IBM Research releases a new set of cloud- and artificial intelligence-based COVID-19 resources (TechRepublic)
The search engine was built on Verizon Media's big data processing technology, the company said. Verizon said it uses Vespa for applications like recommendation, personalization, and ad targeting.
Verizon is welcoming contributions and the company advised anyone interested to refer to its contributing guide. The application can be downloaded, and users can index the data set, and improve the service. More Information on Vespa.ai can be found here.
Seeking AI experts
Additional research will be added into the CORD-19 data set. The initiative is being spearheaded by the White House, the Allen Institute for AI, the Chan Zuckerberg Initiative (CZI), Georgetown University's Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM) at the National Institutes of Health.
The group referred to the project as a "call to action for the AI community," and asked the nation's AI experts to launch new text and data mining techniques to help the science community answer critical questions about the virus.
The CORD-19 resource is available on the Allen Institute's SemanticScholar.org website and will continue to be updated as new research is published in archival services and peer-reviewed publications, according to the federal Office of Science and Technology Policy.
Researchers should submit the text and data mining tools and insights they develop in response to this call to action via the Kaggle platform, a machine learning and data science community owned by Google Cloud. Kaggle's tools are openly available for researchers around the world, the White House said.
"It's difficult for people to manually go through more than 20,000 articles and synthesize their findings," said Anthony Goldbloom, cofounder and CEO of Kaggle, in a statement. "Recent advances in technology can be helpful here. We're putting machine readable versions of these articles in front of our community of more than four million data scientists. Our hope is that AI can be used to help find answers to a key set of questions about COVID-19."
- How to become a data scientist: A cheat sheet (TechRepublic)
- 60 ways to get the most value from your big data initiatives (free PDF) (TechRepublic download)
- Feature comparison: Data analytics software, and services (TechRepublic Premium)
- Volume, velocity, and variety: Understanding the three V's of big data (ZDNet)
- Best cloud services for small businesses (CNET)
- Big Data: More must-read coverage (TechRepublic on Flipboard)