Software

Speech recognition in Windows Vista

Vista is the first Microsoft OS that has built-in speech recognition capabilities. Using this feature, you can perform tasks such as starting and closing programs, saving and deleting files, dictating text to be typed verbatim into a document, and editing the text. Deb Shinder shares her experiences working with Vista speech recognition and explains the available options.

This article is also available as a PDF download and a gallery.

Ever since Star Trek made it seem commonplace, many computer users have dreamed of being able to throw away their keyboards, exterminate their mice, and control their computers with their voices. Programs that make it possible to issue commands or dictate text to your computer have been around for many years and have proven especially useful to those who are physically unable to use other input methods. But such programs have never really gained widespread popularity.

Windows Vista is the first Microsoft operating system to come with speech recognition built in. Previously, speech recognition functionality was a part of Microsoft Office XP or Office 2003 or could be added through third-party software such as Dragon NaturallySpeaking. Microsoft's Voice Command added (limited) speech recognition to Windows Mobile operating systems. In any case, you had to buy and install additional software.

With Vista, it's not necessary to buy anything extra to start talking to your computer. It's not enabled by default, but it's right there in the Control Panel, ready to be set up, as shown in Figure A.

Figure A

Vista speech recognition is set up and configured through the Control Panel.

You can also access the speech recognition feature through the All Programs | Accessories | Ease Of Access menu, as shown in Figure B.

Figure B

You can also access speech recognition through the Ease Of Access menu.

How it works

There are two ways to use speech recognition technology:

  • To control the software: Start and close programs and switch between them, save and delete files, and so forth.
  • To dictate text to be typed verbatim into a document and edit the text.

Developers can use the Vista speech APIs to add speech recognition capabilities to any application. However, Vista's speech recognition doesn't currently work with all languages. It's available in English (both U.S. and U.K.), German, French, Spanish, Japanese, and Chinese (both Traditional and Simplified).

Setting up and configuring speech recognition

Before you can start using speech recognition, you need to complete the following steps:

  • Turn on speech recognition.
  • Set up your microphone.
  • Complete a tutorial (not required, but recommended).
  • Train the recognition engine to understand your voice (not required, but recommended).

When you double-click the Speech Recognition applet in Control Panel or select Speech Recognition from the Ease Of Access menu, the Speech Recognition Options dialog box opens, shown in Figure C.

Figure C

The first step is to configure your speech recognition experience.

When you click Start Speech Recognition, the Speech control console will appear at the top of your screen, as shown in Figure D.

Figure D

The Speech control console appears at the top of the screen when speech recognition is turned on.

If you have speech recognition configured to start when Windows boots up, the console will appear when you start your computer. There will also be a Speech icon (a white microphone on a blue circular background) in the system tray/notification area when speech recognition is on.

You can select Speech options by right-clicking the microphone icon, either on the control console or in the system tray. This will display the context menu shown in Figure E.

Figure E

You can select numerous options from the context menu.

From the menu, you can select from the following:

  • Turn Speech On: The computer will listen to everything you say and attempt to carry out commands it recognizes.
  • Sleep: The computer will listen but will not respond to your voice unless/until you say "Start listening."
  • Off: The computer will not listen to anything you say.
  • Open Speech Reference Card: This is a handy cheat sheet of common commands and how-to information.
  • Start Speech Tutorial: This is an interactive video walk-through that teaches you to use speech recognition by actually doing it.
  • Help: This opens the Help files for setting up and using speech recognition.
  • Options: Here, you can select whether to have speech recognition play audible feedback, run at Startup, speak text in correction dialog, and/or enable dictation everywhere.
  • Configuration: Here, you can set up your microphone, improve voice recognition, or open the Speech control panel.
  • Open The Speech Dictionary: You can add new words to the dictionary (especially good for adding names and other words that are difficult for the engine to recognize) or prevent certain words from being dictated (words you would never dictate).
  • Dictation Topic: The only choice here is Narrative.
  • Go To The Speech Recognition Web site.
  • Get Information About Speech Recognition: This is the familiar Windows About dialog box that tells you the version/build number and licensee name.
  • Open Speech Recognition.
  • Exit: This turns off speech recognition and removes the control console from your screen and the Speech icon from the system tray.

Setting up the microphone

You can set up your microphone from the Speech Recognition Options dialog box or the Speech context menu. The microphone setup wizard will first ask you to identify your microphone type (headset, desktop, or other). The wizard recommends that you use a headset, and I can verify that the microphone type makes a huge difference.

The first time I tried to use Vista's speech recognition, I was using a desktop microphone that works fine for such tasks as recording voiceovers for PowerPoint presentations in Camtasia. However, when I tried dictating in Vista, the results were laughable; I was rarely able to dictate a whole sentence without at least one misinterpreted word, regardless of how carefully and clearly I tried to enunciate. After I switched to a headset (which cost about 30 dollars more than the desktop microphone), accuracy improved to the point where mistakes were occurring once every five or six sentences rather than several times per sentence.

After you select the microphone type, the next page of the wizard shows you how to position it correctly for best results. Next, you're asked to read a short bit of text aloud into the microphone, as shown in Figure F.

Figure F

You must speak into the microphone so Windows can automatically adjust the volume.

The microphone is now set up and ready to use. However, that doesn't mean the speech engine is ready to work with your voice. If you have a standard, newscaster non-accent and always enunciate very clearly, you might be able to use speech recognition without training it to your voice. If you have a Texas accent as I do, or any other nonstandard way of speaking, you'll get much better results if you go through the training process.

Training process involves reading a series of text selections, one screen at a time, as shown in Figure G.

Figure G

Training the speech engine to recognize your own way of pronouncing words will improve accuracy.

Using voice commands

Now you're ready to use voice commands to perform tasks on your computer. The speech engine is typically much more accurate at recognizing commands than dictation, because it's listening for only a limited number of commands.

First, be sure the Speech console shows the speech status as Listening. If it doesn't, say, "Start listening" or right-click the microphone icon and select On: Listen To Everything I Say.

Voice commands are designed to be as intuitive as possible. For example, to open a program from the Start menu:

  • Say "Click Start."
  • Say "Click All Programs."
  • Say "Microsoft Office Word 2007" (or the name of whatever other program you want to open, as it is named in the Programs menu).

Simple commands are easy to use. Navigating around in some programs can be a little more challenging, but you can, for example, tab to the next option by saying "Tab."

What do you do if you want to click a button or link for which you don't know the name, such as the Office logo button at the top-left corner of Word? Here's a nifty trick: just say, "Show numbers," and all interactive elements in the active window will be overlaid with numbers, as shown in Figure H.

Figure H

If you need to click on a button or element but don't know its name, just say "Show numbers."

Now all you have to do is say the number of the button you want to click. An OK box will appear on that element. Say, "OK," and you've clicked the button.

Getting help

If you don't know how to do something, you can use Help (in English only) by asking, "How do I" followed by the task you want to perform. For example, you might ask, "How do I turn on speech recognition?" Windows will show you a list of Help topics that seem to match your question, as shown in Figure I.

Figure I

You can ask how to do a task and Windows will display Help files that match your question.

Dictating text

You can dictate text into any speech-enabled application. You are not limited to Microsoft Office applications as you were in the past. For example, you can dictate into Notepad or WordPad. You can also dictate into the Windows Live Writer blogger application.

I was not able to dictate into Open Office Writer and other non-Microsoft programs by default, but when I selected Options | Enable Dictation Everywhere from the Speech context menu, which is used to dictate text into programs that don't automatically accept dictation, I was able to dictate to the Open Office program. However, it didn't work as well as with Microsoft programs. Instead of immediately typing the text I spoke, it would pop up a number of alternatives for me to choose from. With the Speech APIs, developers can make their applications speech-enabled (and many more probably will in the future).


Tip

When you're dictating, Vista will type everything you say into the document. It can be a little disconcerting if, while working on a document, you stop to talk to a colleague and then find your end of the conversation transcribed into the document. After a while, it becomes second nature to tell Vista, "Stop listening" when you want to say something you don't want transcribed.


It's likely that Vista will make mistakes when transcribing your dictation. The good news is that they're easy to correct. For example, if you say, "I need another byte" and Vista types "I need another bite," you can just say, "Correct bite," and you'll be presented with a list of replacement words, as shown in Figure J.

Figure J

Correcting mistakes is easy; you can pick from a list of possible replacements.

If the correct word isn't in the list, just say, "Spell it." A box will appear where you can spell the word one letter at a time, as shown in Figure K.

Figure K

If the word you need isn't in the list, you can spell it.

Advanced configuration settings

You can access several advanced configuration settings by clicking the Advanced Speech Options link in the left pane of the Speech Recognition Options dialog box, as shown in Figure L.

Figure L

You can set advanced configuration options and train your profile.

Here, you can create and train speech recognition profiles. This is useful when more than one person shares the computer. You can also choose whether to run speech recognition at startup and whether to allow the computer to review your documents and mail to improve the accuracy of the speech recognition engine.

In addition, you can select the number of spaces to insert after punctuation marks and adjust the microphone level.

Speech recognition limitations

I was impressed with the ease of use and accuracy of the Vista speech recognition engine after half an hour of training time. I've tried dictation programs before and never found them at all usable; I could always type much faster than I could dictate and correct text. Now I finally feel that if I should ever lose the use of my hands, there would still be a way for me to continue to get my work done. For me, a combination of speech recognition (primarily for commands) and keyboard input works well.

However, I'm using Vista on a high end computer system that has a Core Duo processor and 4 GB of RAM. I can't vouch for how fast it works on a less powerful computer. I'm also using a headset microphone. As I mentioned, my experiences shows that a desktop microphone doesn't work nearly as well. Putting in some time training it to your own voice also makes a big difference.

For obvious reasons, speech recognition wouldn't work well in a noisy environment where you share an office with other people who are talking or on the phone while you work, nor would it work well if you like to listen to music or talk radio while you work.

Security issues

Before you decide to start talking to your computer all the time, be aware that there's a security issue involved with using speech recognition. George Ou went into detail about it in his blog. Here's the gist: An attacker could embed a sound file that plays automatically when you go to a Web page or send you a sound file in e-mail that plays when you double-click on it. If the sound file that plays through your computer speakers is a command recognized by Vista's Speech engine, and the speech recognition feature is running, the computer will carry out the command.

This isn't quite as scary as it could be. To perform most administrative tasks in Vista, you have to respond to the User Account Control prompt, which can't be done by voice. However, it's possible for the attacker to delete a file on your computer using this method.

When speech recognition is in Sleep mode, it responds only to the words "Start listening"--but the attacker could easily put that phrase at the beginning of the sound file to turn it on. Thus, the best practice is to always turn speech recognition off completely when you aren't using it, rather than leaving it in Sleep mode, and don't configure it to run when you start Windows.

About

Debra Littlejohn Shinder, MCSE, MVP is a technology consultant, trainer, and writer who has authored a number of books on computer operating systems, networking, and security. Deb is a tech editor, developmental editor, and contributor to over 20 add...

18 comments
nivethas
nivethas

Windows Vista Speech Recognition focuses not only on providing world class accuracy, but on providing the most usable end-to-end speech recognition experience.. It addresses key issues that currently frustrate or confuse users of existing products.. Windows Vista Speech Recognition provides an efficient, enjoyable way to get your tasks done with speech.. MIcrosoft Help

robert.lamb.us
robert.lamb.us

I have had RSI for some time now, and thought Windows Vista speech recognition would be the solution for my tendinitis. However, the Windows Vista speech recognition has some serious limitations, specially if you want to click anywhere on the screen. I settled upon an extension for Win Vista called Voice Finger ( http://voicefinger.cozendey.com ), that somehow fill the gaps in Win Vista recognition. I guess this software is not targeted to people who use speech recognition like an alternative from time to time, but if you want (or needs) to reduce computer contact to zero, this software is great.

jackwoelfel
jackwoelfel

I have vista ultimate. How do I add French dictation?

alexeivna
alexeivna

I work as a caregiver with a guy who uses Dragon Dictate. He had Vista briefly on a laptop that didn't work out, and really preferred their VR software. Is it available for download (I haven't found it yet!) AND will it work (at lest well enough for basic functions) on XP Home? I'd be most obliged if anyone has an answer! Many thanks, Tatiana

mluck
mluck

I've used Dragoin naturally speaking for several years now, with the latest 9.5 pro and found it to be great - with proper user training and experience. Dragon won't run on my 64 bit vista unfortunately (nuance says should support 64 bit in next mjor release) so I've had to try vista's. I must sya htat's it's 'ok', but in my experience dragon still works better - accuracy rate, ease of correction, and other factors. But in as pinch, Vista will do

foreigner
foreigner

I have no experience with Vista. Here are some questions to experienced persons. Whose speech engine did MS buy or license? How does the MS edition compare to the original? How does it compare to other products of vendors such as ScanSoft? I wonder whether the various dictionaries be integrated with Office instances. Ideally, there should be one set of dictionaries / language that all (proofing) products use (O/S AND Office). Is the language of speech to be recognized distinct from the ?speech dictionary?? I prejudge that this be a futile question. It is not frivolous. I would choose 'en-US' speech recognition but require English (non-American) proofing dictionaries ? to work in a one-pass process ? without presumption and imposition of a USA particular default.

dan
dan

I've created "getting started" video lessons for Vista Speech Recognition. You can see them at www.talktowindows.com. Dan Newman

georgeou
georgeou

Speech is flaky sometimes, and it's not just a Microsoft thing. Speech recognation and dictation has been around for some time; and it still lacks a lot of maturity. Sometimes I found it to be pretty good, sometimes it would just go crazy on me and I'd be tearing my hair out. There's a pretty funny video on YouTube showing a Microsoft demo that completely went crazy even though the Microsoft guy was wearing a headset. I can testify that I've had similar crazy experiences. Thanks for mentioning the security issue. One workaround that I forgot to mention is that you can use a headset and just leave voice command on permanently and safely. It's only a problem if you're running speaker and mic mode and you leave the speaker and mic running. I have started to use speech for dictation sometimes not because I can't type fast, but because I want to practice talking clearly and smoothly.

JodyGilbert
JodyGilbert

Have you experimented with Vista's speech recognition feature? Did it work well enough to be a useful tool or did you find it too awkward to be practical?

sml
sml

As far as I can tell, the Speech Recongintion engine for users for Vista is not available by itself. However, if he has WinXP the engine is available there, under Control Panel > Speech or Under Control Panel > Regional and Lang. Options. However, in WinXP, if you install Office 2007, the Speech tools are removed. Why? Becasue in WinXP Speech is a part of the MS Office suite, not the OS. In Vista, it is now part of the OS.

sbridgeford
sbridgeford

I started using speech recognition in the early beta versions of vista. I have always thought that this one feature was good enough reason to upgrade windows. Recently I had a client send me a jpeg image of a text document (don't ask me why he only had it has an image) I did not look forward to having to retype the document into the Web-page that was developing for his web site. That's when I got the idea to just sit back and read the document in two a text editor. Within a matter of a few moments the entire document went from an unusable image format into a text format that could be used on the website. It saved me a lot of time and effort in re-keying the document and I was able to turn around the Web-page update very quickly. In fact if you're interested I am using voice recognition to type in my response to this posting.

Fil0403
Fil0403

Have you experimented with Vista's speech recognition feature? Yes. Did it work well enough to be a useful tool or did you find it too awkward to be practical? It definitely worked well enough to be a useful tool.

mluck
mluck

I've worked with speech recognition for several years and am partial to Dragon naturally speaking. I have begun using Vista's built in however. My high level recommendation is the same for both - without som einitial training, you proably won't get the best benefit of speech, and may in fact be turned off by it.

starsew
starsew

I went through the tutorial last week and have started using the speech recognition, which I think is wonderful. It's fun and my daugters like the idea of being able to dictate their school essays and papers without typing. I've been using Windows Vista for about three weeks now and I'm very satisfie. There are a few quirks, but all in all, it's one of Microsofts best.

mark
mark

I've had it on my desktop PC, my laptop, my wife's laptop, my personal TabletPC and my office TabletPC. It's part of the Speech tools. As far as I can tell it's part of the OS (pre-installed on all of the machines I've listed above), so surely that means Vista ISN'T the first Microsoft OS with speech recognition?

Why Me Worry?
Why Me Worry?

because those are the only sounds and words that will be coming out the mouthes of people who try to get Vista to work properly.

sml
sml

It is true that the speech recognition is in Windows XP but it's really only useful within the Microsoft office programs and other programs where you're simply using it to type text with your voice. This means that speech recognition works all over in the OS from the desktop to the browser to every single program you can use your voice to navigate around the OS select commands that dictate text. In fact I have dictated my reply using speech recognition. It works pretty well!

Fil0403
Fil0403

because it is a very helpful tool to people so dumb that can't even get Vista to work properly, like you.

Editor's Picks