A new artificial intelligence tool created by Google and Oxford University researchers could significantly improve the success of lip-reading and understanding for the hearing impaired. In a recently released paper on the work, the pair explained how the Google DeepMind-powered system was able to correctly interpret more words than a trained human expert.
The tool is called Watch, Listen, Attend and Spell (WLAS), and the paper describes it as a "network that learns to transcribe videos of mouth motion to characters." Using videos from the BBC, the team trained the system with a dataset of more than 100,000 natural sentences.
While similar attempts in the past have focused on a narrow set of words, the report said, Google and Oxford wanted to address lip reading through "unconstrained natural language sentences, and in the wild videos."
The professional human lip reader whom the researchers compared the results against had roughly 10 years of experience in the field and had deciphered videos for the royal wedding and in court for trials as well, the report said. The reader was given a sample of 200 videos from the set used to train WLAS, and the videos played for 10 times longer than what they were played for the AI system.
While the human professional was able to clearly decipher less than one quarter of the words in their sample, the WLAS system could decipher half of the words in its dataset. This is clearly a significant difference. According to a report from New Scientist, the human expert deciphered 12.4% of their words, while the AI system got 46.8% correct.
"The model also surpasses the performance of all previous work on standard lip reading benchmark datasets, and we also demonstrate that visual information helps to improve speech recognition performance even when the audio is used," the report said. It went on to say that the research could help "discern important discriminative cues that are beneficial for teaching lip reading to the hearing impaired."
The research follows similar work from Oxford on another lip-reading system called LipNet, but that project used a much smaller dataset. The news also comes only a couple months after Google claimed that its DeepMind research arm was able to make AI systems sound much more human.
Microsoft also hit some AI milestones recently as well. In October, the company said that its AI technology was able to recognize conversational speech better than humans who do so professionally.
The 3 big takeaways for TechRepublic readers
- Oxford University and Google DeepMind have built an AI tool that can read lips far better than a professional human lip-reader, which could help the hearing impaired.
- The model used BBC videos, and was able to surpass the benchmarks set for similar research in the field.
- Google DeepMind also recently was able to get its AI systems to sound more human with advanced text-to-speech technology innovations.
- Microsoft's AI can now understand speech better than humans (TechRepublic)
- Google DeepMind wins again: AI trounces human expert in lip-reading face-off (ZDNet)
- Why making AI sound human is a bad idea (TechRepublic)
- Google's DeepMind claims major milestone in making machines talk like humans (ZDNet)
- How new AI fools humans into thinking artificial sounds are real (TechRepublic)
Conner Forrest has nothing to disclose. He doesn't hold investments in the technology companies he covers.
Conner Forrest is a Senior Editor for TechRepublic. He covers enterprise technology and is interested in the convergence of tech and culture.