Vista is the first Microsoft OS that has built-in speech recognition capabilities. Using this feature, you can perform tasks such as starting and closing programs, saving and deleting files, dictating text to be typed verbatim into a document, and editing the text. Deb Shinder shares her experiences working with Vista speech recognition and explains the available options.
Ever since Star Trek made it seem commonplace, many computer users have dreamed of being able to throw away their keyboards, exterminate their mice, and control their computers with their voices. Programs that make it possible to issue commands or dictate text to your computer have been around for many years and have proven especially useful to those who are physically unable to use other input methods. But such programs have never really gained widespread popularity.
Windows Vista is the first Microsoft operating system to come with speech recognition built in. Previously, speech recognition functionality was a part of Microsoft Office XP or Office 2003 or could be added through third-party software such as Dragon NaturallySpeaking. Microsoft's Voice Command added (limited) speech recognition to Windows Mobile operating systems. In any case, you had to buy and install additional software.
With Vista, it's not necessary to buy anything extra to start talking to your computer. It's not enabled by default, but it's right there in the Control Panel, ready to be set up, as shown in Figure A.
|Vista speech recognition is set up and configured through the Control Panel.|
You can also access the speech recognition feature through the All Programs | Accessories | Ease Of Access menu, as shown in Figure B.
|You can also access speech recognition through the Ease Of Access menu.|
How it works
There are two ways to use speech recognition technology:
- To control the software: Start and close programs and switch between them, save and delete files, and so forth.
- To dictate text to be typed verbatim into a document and edit the text.
Developers can use the Vista speech APIs to add speech recognition capabilities to any application. However, Vista's speech recognition doesn't currently work with all languages. It's available in English (both U.S. and U.K.), German, French, Spanish, Japanese, and Chinese (both Traditional and Simplified).
Setting up and configuring speech recognition
Before you can start using speech recognition, you need to complete the following steps:
- Turn on speech recognition.
- Set up your microphone.
- Complete a tutorial (not required, but recommended).
- Train the recognition engine to understand your voice (not required, but recommended).
When you double-click the Speech Recognition applet in Control Panel or select Speech Recognition from the Ease Of Access menu, the Speech Recognition Options dialog box opens, shown in Figure C.
|The first step is to configure your speech recognition experience.|
When you click Start Speech Recognition, the Speech control console will appear at the top of your screen, as shown in Figure D.
|The Speech control console appears at the top of the screen when speech recognition is turned on.|
If you have speech recognition configured to start when Windows boots up, the console will appear when you start your computer. There will also be a Speech icon (a white microphone on a blue circular background) in the system tray/notification area when speech recognition is on.
You can select Speech options by right-clicking the microphone icon, either on the control console or in the system tray. This will display the context menu shown in Figure E.
|You can select numerous options from the context menu.|
From the menu, you can select from the following:
- Turn Speech On: The computer will listen to everything you say and attempt to carry out commands it recognizes.
- Sleep: The computer will listen but will not respond to your voice unless/until you say "Start listening."
- Off: The computer will not listen to anything you say.
- Open Speech Reference Card: This is a handy cheat sheet of common commands and how-to information.
- Start Speech Tutorial: This is an interactive video walk-through that teaches you to use speech recognition by actually doing it.
- Help: This opens the Help files for setting up and using speech recognition.
- Options: Here, you can select whether to have speech recognition play audible feedback, run at Startup, speak text in correction dialog, and/or enable dictation everywhere.
- Configuration: Here, you can set up your microphone, improve voice recognition, or open the Speech control panel.
- Open The Speech Dictionary: You can add new words to the dictionary (especially good for adding names and other words that are difficult for the engine to recognize) or prevent certain words from being dictated (words you would never dictate).
- Dictation Topic: The only choice here is Narrative.
- Go To The Speech Recognition Web site.
- Get Information About Speech Recognition: This is the familiar Windows About dialog box that tells you the version/build number and licensee name.
- Open Speech Recognition.
- Exit: This turns off speech recognition and removes the control console from your screen and the Speech icon from the system tray.
Setting up the microphone
You can set up your microphone from the Speech Recognition Options dialog box or the Speech context menu. The microphone setup wizard will first ask you to identify your microphone type (headset, desktop, or other). The wizard recommends that you use a headset, and I can verify that the microphone type makes a huge difference.
The first time I tried to use Vista's speech recognition, I was using a desktop microphone that works fine for such tasks as recording voiceovers for PowerPoint presentations in Camtasia. However, when I tried dictating in Vista, the results were laughable; I was rarely able to dictate a whole sentence without at least one misinterpreted word, regardless of how carefully and clearly I tried to enunciate. After I switched to a headset (which cost about 30 dollars more than the desktop microphone), accuracy improved to the point where mistakes were occurring once every five or six sentences rather than several times per sentence.
After you select the microphone type, the next page of the wizard shows you how to position it correctly for best results. Next, you're asked to read a short bit of text aloud into the microphone, as shown in Figure F.
|You must speak into the microphone so Windows can automatically adjust the volume.|
The microphone is now set up and ready to use. However, that doesn't mean the speech engine is ready to work with your voice. If you have a standard, newscaster non-accent and always enunciate very clearly, you might be able to use speech recognition without training it to your voice. If you have a Texas accent as I do, or any other nonstandard way of speaking, you'll get much better results if you go through the training process.
Training process involves reading a series of text selections, one screen at a time, as shown in Figure G.
|Training the speech engine to recognize your own way of pronouncing words will improve accuracy.|
Using voice commands
Now you're ready to use voice commands to perform tasks on your computer. The speech engine is typically much more accurate at recognizing commands than dictation, because it's listening for only a limited number of commands.
First, be sure the Speech console shows the speech status as Listening. If it doesn't, say, "Start listening" or right-click the microphone icon and select On: Listen To Everything I Say.
Voice commands are designed to be as intuitive as possible. For example, to open a program from the Start menu:
- Say "Click Start."
- Say "Click All Programs."
- Say "Microsoft Office Word 2007" (or the name of whatever other program you want to open, as it is named in the Programs menu).
Simple commands are easy to use. Navigating around in some programs can be a little more challenging, but you can, for example, tab to the next option by saying "Tab."
What do you do if you want to click a button or link for which you don't know the name, such as the Office logo button at the top-left corner of Word? Here's a nifty trick: just say, "Show numbers," and all interactive elements in the active window will be overlaid with numbers, as shown in Figure H.
|If you need to click on a button or element but don't know its name, just say "Show numbers."|
Now all you have to do is say the number of the button you want to click. An OK box will appear on that element. Say, "OK," and you've clicked the button.
If you don't know how to do something, you can use Help (in English only) by asking, "How do I" followed by the task you want to perform. For example, you might ask, "How do I turn on speech recognition?" Windows will show you a list of Help topics that seem to match your question, as shown in Figure I.
|You can ask how to do a task and Windows will display Help files that match your question.|
You can dictate text into any speech-enabled application. You are not limited to Microsoft Office applications as you were in the past. For example, you can dictate into Notepad or WordPad. You can also dictate into the Windows Live Writer blogger application.
I was not able to dictate into Open Office Writer and other non-Microsoft programs by default, but when I selected Options | Enable Dictation Everywhere from the Speech context menu, which is used to dictate text into programs that don't automatically accept dictation, I was able to dictate to the Open Office program. However, it didn't work as well as with Microsoft programs. Instead of immediately typing the text I spoke, it would pop up a number of alternatives for me to choose from. With the Speech APIs, developers can make their applications speech-enabled (and many more probably will in the future).
When you're dictating, Vista will type everything you say into the document. It can be a little disconcerting if, while working on a document, you stop to talk to a colleague and then find your end of the conversation transcribed into the document. After a while, it becomes second nature to tell Vista, "Stop listening" when you want to say something you don't want transcribed.
It's likely that Vista will make mistakes when transcribing your dictation. The good news is that they're easy to correct. For example, if you say, "I need another byte" and Vista types "I need another bite," you can just say, "Correct bite," and you'll be presented with a list of replacement words, as shown in Figure J.
|Correcting mistakes is easy; you can pick from a list of possible replacements.|
If the correct word isn't in the list, just say, "Spell it." A box will appear where you can spell the word one letter at a time, as shown in Figure K.
|If the word you need isn't in the list, you can spell it.|
Advanced configuration settings
You can access several advanced configuration settings by clicking the Advanced Speech Options link in the left pane of the Speech Recognition Options dialog box, as shown in Figure L.
|You can set advanced configuration options and train your profile.|
Here, you can create and train speech recognition profiles. This is useful when more than one person shares the computer. You can also choose whether to run speech recognition at startup and whether to allow the computer to review your documents and mail to improve the accuracy of the speech recognition engine.
In addition, you can select the number of spaces to insert after punctuation marks and adjust the microphone level.
Speech recognition limitations
I was impressed with the ease of use and accuracy of the Vista speech recognition engine after half an hour of training time. I've tried dictation programs before and never found them at all usable; I could always type much faster than I could dictate and correct text. Now I finally feel that if I should ever lose the use of my hands, there would still be a way for me to continue to get my work done. For me, a combination of speech recognition (primarily for commands) and keyboard input works well.
However, I'm using Vista on a high end computer system that has a Core Duo processor and 4 GB of RAM. I can't vouch for how fast it works on a less powerful computer. I'm also using a headset microphone. As I mentioned, my experiences shows that a desktop microphone doesn't work nearly as well. Putting in some time training it to your own voice also makes a big difference.
For obvious reasons, speech recognition wouldn't work well in a noisy environment where you share an office with other people who are talking or on the phone while you work, nor would it work well if you like to listen to music or talk radio while you work.
Before you decide to start talking to your computer all the time, be aware that there's a security issue involved with using speech recognition. George Ou went into detail about it in his blog. Here's the gist: An attacker could embed a sound file that plays automatically when you go to a Web page or send you a sound file in e-mail that plays when you double-click on it. If the sound file that plays through your computer speakers is a command recognized by Vista's Speech engine, and the speech recognition feature is running, the computer will carry out the command.
This isn't quite as scary as it could be. To perform most administrative tasks in Vista, you have to respond to the User Account Control prompt, which can't be done by voice. However, it's possible for the attacker to delete a file on your computer using this method.
When speech recognition is in Sleep mode, it responds only to the words "Start listening"—but the attacker could easily put that phrase at the beginning of the sound file to turn it on. Thus, the best practice is to always turn speech recognition off completely when you aren't using it, rather than leaving it in Sleep mode, and don't configure it to run when you start Windows.