SMBs

Speech technology: Does it matter to SMBs?

Computer voice command and speech recognition are "sexy" technologies that everyone likes to talk about, but are they ready for prime time – especially in the SMB environment?

Last month, I attended the MVP Summit at the Seattle Convention Center and the Microsoft Campus in Redmond. Bill Gates gave the keynote speech for the first time in a couple of years, and one thing that was clear from his talk was that one of the technologies about which he's most excited is speech. Windows Vista has a much improved voice recognition engine built into the operating system, and I believe Microsoft will be focusing even more closely on speech in the future.

Voice command, speech-to-text, and text-to-speech are the esoteric types of features that make good copy; the whole idea conjures up images of futuristic sci-fi movies set on other planets in the twenty-third century where keyboards and mice are a thing of the past and we can interact with computers in the same conversational manner that we interact with fellow human beings. But what about here and now? What -- if any -- business value does speech technology offer SMBs today? Let's take a look at what’s out there and what’s coming soon and how it fits into your business strategy.

How speech technology works

There are three main parts to speech technology:

  • Voice command: a set of predefined spoken commands are recognized by the computer and used to perform tasks normally done by clicking menus and buttons or typing keyboard shortcuts (for example, opening a document, running a program, saving a file).
  • Speech-to-text: The computer "listens to" the words spoken and transcribes them to print on the screen, usually within a word processing program such as Microsoft Word.
  • Text-to-speech: The computer "reads" text documents, outputting to the computer’s sound card.

Advantages of speech technology to SMBs

A generation ago, executives and professionals didn’t need to know how to type. They routinely dictated their correspondence to secretaries who took shorthand or into tape recordings that were later transcribed by those in the "typing pool." Information was entered into computers (if the company had computers) by data entry clerks.

Today keyboarding is taught in elementary school and most office workers have some level of typing skill. However, many still aren’t highly proficient at it, and we can all speak more quickly than we can type. In addition, carpel tunnel syndrome is a real problem for many people that makes it painful to type on a computer keyboard for long periods of time. And there are situations in which we can’t (or shouldn’t) be operating a keyboard, such as when we’re driving a car. We can safely speak while driving, though, and could get work done if we could input information to our computers via voice.

Speech technology also makes it easier for disabled persons to use computers. Those with severe arthritis, hands that have been injured or amputated, and so forth can use speech recognition to input data into the computer, and the blind can use text-to-speech to read electronic documents to them.

Speech technology is also an important part of the unified communications model, by which workers can, for example, call in to their email servers and have their messages read to them over the phone, dictate replies, or use voice commands to delete, move, or forward messages.

Drawbacks of speech technology for SMBs

Although using your voice may seem like the most natural and easiest way to interface with your computer, it may not work as well in practice as it does in theory.

Pronunciation differences and homonyms

One problem is the huge range of differences in human voice tone, accent or dialect and individual pronunciation. People from one part of the country may pronounce "oil" as "awl," while those in another region say "Cuber" for "Cuba."

To complicate matters further, the English language has many homonyms, or words that sound the same but are spelled differently (e.g., there, they’re and their or to, two, and too).

This is the reason speech to text is the most difficult of the three speech technologies to implement well. It’s a tremendous challenge for the computer to be able to accurately recognize words that are spoken very differently by different people or to differentiate between words that sound exactly alike.

The result is that transcribed speech often includes many errors that have to be corrected, either by keyboard or by voice. Once you factor in the time spent on error correction, most touch typists find that they can create a finished document using the keyboard more quickly than they can do so using speech.

However, speech engines have been improving steadily and error rates are coming down. This is especially true if users take the time to "train" the speech engine extensively. This is done by reading many passages to the computer so it can become familiar with your voice tone and pronunciations. It takes longer to train the computer if you have a heavy Boston or Texas accent, for example, than if you have standard Midwest "TV anchor" speech patterns -- but it can be done.

If you’re considering implementing speech to text, you need to allocate sufficient time for voice training and assess the current typing speed of your users and whether speech will improve productivity or slow them down.

The business environment

Another consideration is the physical environment in which users work. If everyone has a private office, workers are separated by soundproof barriers or widely spaced, speech technologies will work better. If you have workers sitting close to one another in a crowded space, one user’s microphone may pick up what’s being said at the next desk and transcribe it. You can reduce this problem by having workers use headset microphones rather than desktop models (headsets will also usually greatly increase the general accuracy of speech recognition).

However, even if microphones are sufficiently directional, having all your workers talking to their computers constantly may create a noisy, chaotic atmosphere that’s unpleasant to work in and leads to loss of concentration and lowered productivity.

Summary

Speech technology is, in a way, the "holy grail" for those who design user interfaces. The technology has come a long way, and software companies are dedicated to making it better. With a pretty good speech recognition engine now built into Vista, more applications will be written to take advantage of speech.

Speech is undoubtedly the wave of the future and the day will probably come when it’s the standard way to interact with computers. But before you throw away all the keyboards at your small or midsize company, assess your employees’ skills and how they work to determine whether using speech technology will benefit them.

You may find that speech is appropriate for some workers and not others, and that for many people, a combination of speech and keyboard input (for example, using voice commands but typing long documents in the traditional way) is the most efficient use of speech technology at this point in time.

About

Debra Littlejohn Shinder, MCSE, MVP is a technology consultant, trainer, and writer who has authored a number of books on computer operating systems, networking, and security. Deb is a tech editor, developmental editor, and contributor to over 20 add...

16 comments
d.jeffs
d.jeffs

I have been using Dragon Naturally Speaking since 1997. As new releases came out, I upgraded. A couple years ago the Version 8.0 was very good with about 98% accuracy. Then Nuance Communications bought them. Last fall Version 9.0 emerged. The speech engine is much faster, and I get about 99% accuracy. On my P4-3.4 Ghz PC it keeps up with my dictation, which is the first time that has happened. This he good news. Anything less than the "Preferred" version which runs about $199.00 is not orth having. The are sales periodically for about $149.99

Tom_geraghty
Tom_geraghty

My wife has chronic RSI, meaning that she cannot type or use a mouse for more than a couple of minutes without it causing severe pain. After months and months of discussion, her employers agreed to purchase a foot-operated mouse, plus Dragon software, which has meant that she can carry out a full days work, write up notes and reports, and even do the shopping online, without using her arms. Without speech recognition, she'd be out of a job.

SObaldrick
SObaldrick

Always write out the meaning of your acronym the first time you use it. I have no idea what an SMB is, and I don't intend to read the article until I see it spelt out. Les.

Endoscopy
Endoscopy

One of the things that a good transcriptionist does in addition to typing what is said, is to correct mistakes in grammar and content. A person can make sense out of what is almost babble because they know the person speaking or is very knowledgeable about what is being said. Also being knowledgeable about what is being said they can prevent embarrassing or costly errors in what is said. A computer will not have this knowledge and careful proof reading will be required adding a layer of cost to the process of using this in SMB's.

Tom_geraghty
Tom_geraghty

in my experience, the secretaries can make as many mistakes as they find. Not being experts in the field in which they might be typing, they don't always understand or know how to spell technical terms, which i'd gamble that speech recognition might. I do agree that there isn't a true substitute for human error-checking though (but speech recog is a hell of a lot cheaper!)

Endoscopy
Endoscopy

I am knowledgeable about the medical field. In that a good transcriptionist take a course in medical vocabulary. They have a variety of books on their desks for terminology for various specialties. The software they use has special medical spell checkers. These people are not secretaries. I would believe transcriptionists in other fields would be the same. Where I live a doctors office decided to go to voice recognition software. They are living a nightmare after 1 year of using it. Doctors often talk like they write. These doctors now have to slow down, enunciate correctly, separate words, and organize what they are going to say before they speak. They used to be able to say "Use standard form X with the following differences." Now they have to say the whole thing. This takes time and effort they would rather give to the patients. I imagine the same would hold true for many other disciplines.

American-Tech
American-Tech

Well, I guess I assumed everyone used it. SMB usually (at least in this context) stands for Small & Medium Business.

blarman
blarman

I would have to contend that this article grossly overestimates the number of poor typists. I would like to point out two technologies the author fails to consider that I believe play a fundamental and inhibiting role in the emergence of voice-recognition: chat or instant messaging and spellchecking. Though I will not vouch for the grammar I sometimes see in chat channels, my typing has improved from 40 WPM to almost 60 mainly due to the fact that I use instant messengers a lot - either at work or in online games. In addition, spellchecking (even this window has it now) makes it very easy to correct even poor grammar and spelling, meaning that (although it doesn't catch everything), it enables one to make a pretty good first attempt at any written document. Though I agree with the author that speech recognition is slowly gaining strength, I think that it has been slow mainly due to improving typing skills, not the converse.

nickdfg
nickdfg

I am dyslexic. I tended to type a word as it sounds. And though I knew some of the rules (like K before N ... sometimes), it was boring having to get someone else to read a spell-checked doc ~ because of "to, too and two". In 1999 I tried out three products: from Dragon, IBM and Learnout&Hauspie (since bought by Dragon). I found the L&H product (VoiceXpress) suited me best, then the IBM ViaVoice product. The DragonDictate did not work for me. Dragon no longer market the VoiceXpress Pro, though the one I have works very well with my current setup of XP Pro and Office 2003. Then two years ago a friend had a mini-stroke; her English accent became a "French / Polish" accent and she could not remember how to spell. She tried my DragonDictate and IBM ViaVoice. She found the IBM product suited her better. She is also a Windows user. Neither of us are office-bound workers. We use the products because they work for us. We persevered with the profile setup because of the benefits to us. And the products learn as we use them. Now the s/w gets it right 99%+ of the time, and we would not be without the SR technology. The one thing that both of us found difficult at first was organising our thoughts before we started talking to the PC. It is very easy for emails and other correspondence, when there is a specific point to make, or when a response to queries is necessary. But both of us find that for longer documents (I regularly need to dictate 2,000+ words) we tend to dictate the section headings first, then dictate the paragraphs. This means that I can have the s/w correct any mistakes whilst I'm thinking of the words for the next paragraph. And if the paragraphs need moving around ~ well, using SR doesn't preclude using the mouse or keyboard when it is more comfortable. I also experimented with dictating to a minidisk, then playing it to the SR program. That worked too, which can be beneficial if you need to make notes after a meeting, and don't want to have to type them up later. There is no doubt that SR is not yet 100% accurate "straight out the box". And productivity will go down when people first start to use it ~ as it will with any new product. But SR definitely has benefits to many different types of users, not just for accessibility reasons.

jwlindsey
jwlindsey

I started experimenting with speech recognition technology back in the mid-to-late 1990's; first with IBM's via voice. I didn't like it much so set it aside. Then in 1999 my father died and left me with a several shoe boxes full of letters from over the years. A couple of those boxes included letters that he and mom wrote to each other before they were married (in 1928). WOW! I was excited to have them! My brother and sister both wanted to read the letters, too. So "speech recognition" to the rescue. I figured I could read the letters into MS Word and then distribute to anyone who wanted to read them... grandchildren included. I obtained a copy of L&H VoicExpress and found it to be far superior than ViaVoice for my application. Later when Dragon bought out L&H, I switched to Naturally Speaking 7.0 and have been using it ever since. I have resisted any upgrades to the software as the new features don't seem to offer any improvments for what I use it for. I use a directional microphone attached to a headset. Naturally Speaking has an "accuracy center" tool to allow proper volume and audio quality settings. When new words are encountered not in the vocabulary, there is a "training" function to add them as I speak or when two words I speak sound alike (but not homonyms) and need to be differentiated. My results are not 100% as the documents still required proofreading for when get to reading too fast and I slur my words together. But for the most part there is enough "intelligence" built into the software to differentiate among the various homonyms in context. But now I can share my parents letters with all interested parties and they can get to know them better... "Holy smokes! Was my mom a rebellious teenager or what?"

welcomeBeenie
welcomeBeenie

I am one of those people who have looked at speech-to-text and vica versa since it began (from what I know) in 1995. Well, at least text-to-speech emerged around that year. I have had immense help from the latter during long car rides, preparing for speeches and stuff. "Reading" dokcuments like this has been an immense help. What I am looking for now is a reliable speech-to-text. I have tried dragon speak, but it is not reliable, although the price is reasonable (99 USD). Even stating the words clearly gives about a 10-15% error rate. Event though the program is much faster than typing yourself, there still is a lot to correct after having spoken 3-4 pages. Does anybody know about any open source software that can compete with dragon speak? Kind regards, T

TechnOntology
TechnOntology

Sorry to be so critical but... I read this pretending to know nothing about speech recognition, translation, or VRU technology. I found nothing in it that I would use to explain to introduce someone to this technology. And nothing that adds to what any technologist or practicioner knows. To the author - I recommend you practice your presentation with a technical person, and when you have something they feed is contributory, then publish it. Regards :-)

willjurgens
willjurgens

As an SMB owner I have found exactly the level of information I need to underscore my understanding of the current state of the technology without a host of tech trees hiding the view. It was very useful to me in my decision making.

tfsimpkins
tfsimpkins

I'm still at a "may or may not work" point or a "works for some but not all" crossroad on this topic. I would have preferred a more one-sided decision. As far as speech recognition software goes, I don't think they'll get it down to an exact science for quite awhile. I noticed that he threw in Windows Vista's Voice Rec Package. Is he a MS Rep. or something? Are Vista sales still down? MS can't even get the Operating System exactly the way they want it and we are supposed to trust some speech recognition package by MS? There are people here in the U.S. who can't even speak English yet but we expect a software package so perfect that it can interpret exactly what we say? Highly unlikely for now. That's MY one-sided opinion.

TechnOntology
TechnOntology

I'm glad it worked for you. I stand by my original comment tho. The site is TechRepublic which I try not to water down to LowTechRepublic.

American-Tech
American-Tech

Actually, if you'll notice. The author's title for the article was 'Does it matter to SMBs?' In that regard, his/her article was very to the point. He/she did not intend to enlighten you to the technology. They were simply explaining how it may or may not help an SMB with productivity/efficiency. And to that, they hit the topic, Right on the money.

Editor's Picks