Mobility

How to use speech recognition to improve productivity on your smartphone

Typing and swiping on a touch screen is the slow way to enter text on a phone. Instead, use speech dictation. It's more accurate and faster than ever before.

Icon of Microphone on left, greater than sign, Icon of QWERTY keyboard on right
Image: Andy Wolber / TechRepublic

Talking is faster than typing. In 2016, a study by researchers from Stanford University, University of Washington, and Baidu found that speech dictation was around three times faster than touch screen typing on mobile devices. The experimenters used short phrases in both English and Mandarin Chinese for their test.

Recognition accuracy matters, too, though, and the accuracy of speech recognition has improved significantly since 2015, when Google touted a word error rate of 8%. In late 2016, Microsoft claimed an error rate of 6.3%. A few months later, in March 2017, IBM announced that they'd reduced their recognition error rate to 5.5%. Then, Google announced a speech recognition error rate of 4.9% in May 2017. Amazon, Apple, Baidu, Nuance, and others are also competing to recognize speech best.

I'd done my own (informal) speech recognition test in 2015, when I tested the native speech dictation capabilities on an iPhone and in Google Docs with both Google's own voice-typing system and a third-party service.

In June 2017, I tested four different speech recognition systems using the same two-sentence phrase I'd used in 2015: Apple's Siri voice dictation system on iOS, Google's Gboard keyboard voice typing, the "Voice typing" option in Google Docs (used on a Chromebook), and Nuance Communications' Dragon Anywhere app. All of these systems are free, except Dragon Anywhere, which costs $15 per month. The Dragon Anywhere app supports voice edits to a document on a mobile device, as does Google's "Voice typing" on a desktop.

Screenshot showing text and three errors (described in text elsewhere)

In my test, nearly all speech dictation systems handled numbers, punctuation, and proper names well. Gboard and Dragon Anywhere processed my words perfectly.

Accuracy

I spoke the same phrase I used in 2015 (see image), which included spoken punctuation: I said "...interest colon science period," hoping to see "interest: science." appear. Both Gboard's voice dictation option and Dragon Anywhere perfectly captured and transcribed the sentences. Gboard's dictation system displayed the word "twelve" instead of "12" following the year 1660. The word may be a better choice than the number, since two numbers in sequence might cause confusion for a reader. Google voice typing on a Chromebook made one error ("Mint" instead of "met"), while the native iOS dictation system produced two ("mad" and "is" instead of "met" and "as").

I experimented with dictating other phrases, as well. None of the transcription systems delivered 100% accuracy all the time, but all of the options created usable transcriptions with only a few minor errors.

Chart showing time for speech recognition, Chromebook keyboard (35 seconds), and touchscreen typing (90+ seconds).

For me, speaking was considerably faster than keyboard-based text input. Eighteen seconds for speech recognition, compared to more than 90 seconds for touchscreen typing.

Speed

Touch-screen input was slowest for me, with little difference between tapping on Apple's native keyboard and swiping words with Gboard. The touchscreen keyboard methods took me a little over a minute-and-a-half to enter the 41 words of text accurately. Apple's autocomplete performed well, while it took me a bit to correct word errors with Gboard while swiping.

I typed the text with a physical keyboard in about 35 seconds, roughly one-third the time it took me to enter it on a touchscreen keyboard. A proficient typist could likely type it even faster.

Talking was the fastest way to input text. That surprised me. I tried several different phrases to make sure it wasn't an anomaly. It wasn't. Every time, speech dictation was the fastest way to enter text, as it took me about 18 seconds to say the sentences.

Of course, voice input isn't always optimal. If you're trying to figure out what to say, then speed doesn't really matter. And in a noisy or public environment, many people may prefer to swipe or tap.

Talk, then edit

If you currently use a touchscreen keyboard for email and/or messaging on your phone, a switch to speech input might save you a significant amount of time.

My experiment convinced me I need to type less on my phone. For long documents, I'll reach for a physical keyboard. But for most email and messages, I should tap the microphone, talk, then review and correct any errors.

Have you recently tested how long it takes you to talk vs. type text? (Try it!) Let me know how accurate and fast you've found speech recognition systems to be for you.

Also see

About Andy Wolber

Andy Wolber helps people understand and leverage technology for social impact. He resides in Ann Arbor, MI with his wife, Liz, and daughter, Katie.

Editor's Picks

Free Newsletters, In your Inbox