IBM Watson machine learns the art of writing a good headline

The team behind IBM's natural language processing and machine learning engine create a deep learning-based system whose document summaries rival those written by humans.

An IBM Watson system.
Image: IBM

IBM Watson's ability to answer questions is being put to good use in fields ranging from healthcare to finance.

Watson's natural language processing and machine learning engine underpins a suite of language recognition, computer vision and data analytics services offered by IBM, and behind the scenes researchers continue to refine the smart system's capabilities.

The latest breakthrough by the team working on Watson's question and answering algorithms is to create a "state-of-the-art" system for automatically summarising documents.

The team used a deep learning approach, previously used for machine translation and to automatically caption videos, to produce short summaries of millions of English newswire reports.

"In this work, we focus on the task of text summarization, which can also be naturally thought of as mapping an input sequence of words in a source document to a target sequence of words called summary," write IBM US researchers Ramesh Nallapati, Bing Xiang and Bowen Zhou in the paper.

The deep learning-based sequence-to-sequence approach they used is more commonly used for machine translation. The team writes that summarising text is significantly different in that the summary is typically short and doesn't heavily depend on the length of the document and that, unlikely machine translation, it's acceptable to omit all but the key concepts in the source material.

Despite these differences this approach, of using an attentional encoder-decoder recurrent neural network to summarise text, "significantly outperforms" the recent state-of-the art model used by Facebook to generate summaries.

"They are surprisingly good and would easily pass muster for a human generated summary in most cases," write the IBM team.

"Our results strongly demonstrate that the sequence-to-sequence models are extremely promising for summarization."

The team's future work will focus on investigating ways to effectively generate rare words in the summary, which they say "appears to be a glaring weakness in the existing models".

Creating machines that can summarise a text in a way that captures its core meaning is important if we want computers to begin to gain a human-like understanding of language. Demand for automated summaries and computer-generated reports is also growing as technology advances to the point where it can output competent write ups. Recently Narrative Science, which offers automated report writing service Quill, said its revenues are doubling each year.

As well as offering Watson services to developers to build third party apps, IBM CEO Ginni Rometty recently said that IBM is "investing aggressively in new opportunities like Watson Health, Watson Internet of Things" as the technology giant attempts to counter declines in its traditional business areas.

This focus on new technologies such as Watson is, according to Credit Suisse analyst Kulbinder Garcha, part of a "painful multi-year turnaround", as IBM attempts to reduce its reliance on earnings from its hardware, operating systems and traditional services businesses, which are being squeezed by a gradual shift to cloud computing.

Read more about IBM Watson...


Nick Heath is chief reporter for TechRepublic. He writes about the technology that IT decision makers need to know about, and the latest happenings in the European tech scene.

Editor's Picks