Innovation

Microsoft's AI can now understand speech better than humans

According to new research from Microsoft, its AI technology can recognize speech with roughly the same error rate as a professional human transcriptionist.

nadella.jpg
Image: Microsoft

Microsoft's artificial intelligence (AI) technology can now recognize conversational speech slightly better than humans who do so professionally, according to recently-released research from the company. Microsoft recently got its AI's error rate in understanding speech down to about 5.9% from 6.3%, which puts it slightly below the human error rate, which is also close to 5.9%

"We [improved] on our recently reported conversational speech recognition system by about 0.4%, and now exceed human performance by a small margin," the report stated.

This news comes a mere month after Microsoft announced that it had reached an error rate of 6.3%, at the time setting a record among its peers. However, Microsoft's research also noted that error rates of human transcribers can vary between 4.1% to 9.6%, depending on how carefully they perform the transcription.

SEE: AI experts weigh in on Microsoft CEO's 10 new rules for artificial intelligence

Still, the closeness in quality to human transcription is impressive. And it's a breakthrough that will likely have a lasting impact on Microsoft's AI tools and personal digital assistants, like Cortana.

More about Innovation

#WeAreNotWaiting: Diabetics are hacking their health, because traditional systems have failed them

Diabetics have been waiting for years for better technology to manage their condition. Some got tired of waiting and hacked together an open source hardware and software solution. This is their story.

As reported by The Verge citing a statement from the company, Microsoft's chief speech scientist Xuedong Huang said that they had "reached human parity," and called the improvement in speech recognition "an historic achievement."

According to the research paper, Microsoft achieved this parity by optimizing "convolutional and recurrent neural networks." The company claims that the AI system was trained on 2,000 hours of data.

And, while both the human and AI participants had similar error rates, they got tripped up on different words and phrases. For example, the AI had a difficult time distinguishing between "uh-huh" and "uh" or "um." The human transcriptionist didn't struggle with that distinction as much. Humans also substituted fewer words, but deleted more.

Microsoft's achievement is substantial, but there is still no telling how it would work in a real-life setting. AI, and its related technologies, has been a focus for Microsoft in recent months. CEO Satya Nadella recently laid out the company's four pillar for democratizing AI, and said that its cloud platform Azure is becoming the first AI supercomputer.

The 3 big takeaways for TechRepublic readers

  1. Microsoft's AI speech recognition recently achieved roughly the same error rate as human transcriptionists, potentially paving the way for the technology to replace jobs.
  2. While both humans and the AI had similar error rates, they committed errors in different aspects of the transcriptions.
  3. Microsoft continues to focus heavily on AI, noting its four pillars for AI democratization, and the recent AI focus of its Azure cloud platform.

About

Conner Forrest is News Editor for TechRepublic. He covers startups and enterprise technology and is passionate about the convergence of tech and culture.

Editor's Picks

Free Newsletters, In your Inbox