Microsoft's artificial intelligence (AI) technology can now recognize conversational speech slightly better than humans who do so professionally, according to recently-released research from the company. Microsoft recently got its AI's error rate in understanding speech down to about 5.9% from 6.3%, which puts it slightly below the human error rate, which is also close to 5.9%
"We [improved] on our recently reported conversational speech recognition system by about 0.4%, and now exceed human performance by a small margin," the report stated.
This news comes a mere month after Microsoft announced that it had reached an error rate of 6.3%, at the time setting a record among its peers. However, Microsoft's research also noted that error rates of human transcribers can vary between 4.1% to 9.6%, depending on how carefully they perform the transcription.
Still, the closeness in quality to human transcription is impressive. And it's a breakthrough that will likely have a lasting impact on Microsoft's AI tools and personal digital assistants, like Cortana.
As reported by The Verge citing a statement from the company, Microsoft's chief speech scientist Xuedong Huang said that they had "reached human parity," and called the improvement in speech recognition "an historic achievement."
According to the research paper, Microsoft achieved this parity by optimizing "convolutional and recurrent neural networks." The company claims that the AI system was trained on 2,000 hours of data.
And, while both the human and AI participants had similar error rates, they got tripped up on different words and phrases. For example, the AI had a difficult time distinguishing between "uh-huh" and "uh" or "um." The human transcriptionist didn't struggle with that distinction as much. Humans also substituted fewer words, but deleted more.
Microsoft's achievement is substantial, but there is still no telling how it would work in a real-life setting. AI, and its related technologies, has been a focus for Microsoft in recent months. CEO Satya Nadella recently laid out the company's four pillar for democratizing AI, and said that its cloud platform Azure is becoming the first AI supercomputer.
The 3 big takeaways for TechRepublic readers
- Microsoft's AI speech recognition recently achieved roughly the same error rate as human transcriptionists, potentially paving the way for the technology to replace jobs.
- While both humans and the AI had similar error rates, they committed errors in different aspects of the transcriptions.
- Microsoft continues to focus heavily on AI, noting its four pillars for AI democratization, and the recent AI focus of its Azure cloud platform.
- Also see
- Microsoft Ignite: Nadella outlines 4 pillars for democratizing AI (TechRepublic)
- Azure is becoming the first AI supercomputer, says Microsoft (ZDNet)
- Why Microsoft's 'Tay' AI bot went wrong (TechRepublic)
- Microsoft's new breakthrough: AI that's as good as humans at listening... on the phone (ZDNet)
- Research: 63% say business will benefit from AI (Tech Pro Research)
Conner Forrest has nothing to disclose. He doesn't hold investments in the technology companies he covers.
Conner Forrest is a Senior Editor for TechRepublic. He covers enterprise technology and is interested in the convergence of tech and culture.