Meaningful conversations with your PC using TTS

Need someone to talk to? Try your PC. Jim Wells takes a look at configuring Text To Speech using Microsoft's Windows XP in this Daily Feature.

We communicate with others in a number of ways every day, many of them electronic. With all the computer-related communications being handled by e-mail, telephony, video conferencing, etc., the need to meet with other humans face to face is slowly diminishing. Maybe that is why developing people skills is not such a high priority among many IT professionals. I can recall several days in my career where I did not ever need to communicate with another human other than by electronic means.

Thus, it is a natural progression that software manufacturers are developing technology that allows us to communicate with our computers through speech. Using the human voice as an input medium is not new, but making it practical to work with in a real-world setting is. In this Daily Feature series, I will examine Windows XP’s new Text To Speech features and configuration. In a follow-up Daily Feature, I will cover some situations where this technology can be useful using Microsoft’s Office XP.

What you need
Make no bones about it; Text To Speech (TTS) is hardware-intensive. The minimum recommended processor is a Pentium-class 400 MHz with a minimum of 128 MB of RAM. If you are using XP, you should have already beefed up your systems beyond this minimum to accommodate the new OS. A decent sound card and microphone are also musts. My test machine had a Creative SB Live card that could push the decibels, but my cheap handheld microphone proved to be a nuisance. Microsoft recommends using a USB headset microphone to keep background noise, the effects of off-axis volume, and tone fluctuations to a minimum.

Setting it up
Configuration of TTS is straightforward. A few setup screens (explained below) and you’re done. You won’t need much time to read through instructions or wade through lots of dialog boxes, but you’ll need to set some time aside to talk to the computer to train it to understand your voice.

The more you train with TTS, the better your results. When I first started working with TTS, I did just one training session. When I used TTS within various Office products, I found myself doing more editing with the keyboard than anything else because of all the mistakes it was making. I believe the biggest factor TTS had to overcome was my accent. (Southerners tend to have a drawl.) After I spent an hour or so talking to the system using XP’s Voice Training module, the mistakes were reduced.

Begin the configuration process by clicking on XP’s Control Panel from the Start Menu and selecting Other Control Panel Options. There, you will find the Speech Control Panel icon (Figure A).

Figure A
Why Microsoft chose these three services to be listed in another control panel window is beyond me.

Double-click the icon to get to the Speech Properties dialog box (Figure B). With this dialog box, you can control the speech profiles for different environments or other users by clicking on the New button in the Recognition Profiles section of the Speech Recognition tab. For instance, you could set up one profile for your office during the day, when there is a lot of background noise to contend with, and one for nighttime to reflect a quieter environment.

Figure B
If you have another language-recognition engine loaded on your machine, you can choose it from the Language drop-down box.

Getting to know your PC
The next step involves training the PC to recognize your voice. When you click on the Train Profile button, you are taken to the Voice Training Wizard (Figure C). There is a plethora of reading choices. For each selection, you must speak each word clearly enough for the PC to understand you before moving on to the next word (Figure D). This step took over an hour for me, as I went through several choices more than once.

Figure C
While the choices for reading seem odd, they do require you to enunciate several difficult words in order to cover a wide spectrum of voice inflections.

Figure D
If the PC just can’t understand you, click on the Skip Word button to move on.

Should the need arise, you can overcome dialects or spelling of certain words by clicking on the Add/Delete Word(s) option located on the Tools menu of the XP Language toolbar (Figure E). Type in the word you want the PC to recognize and click the Record Pronunciation button before speaking the word into your microphone.

Figure E
This is a handy feature if you find TTS consistently stumbling on certain words.

Odds and ends
The rest of the configuration process is more related to user preference than anything else. For example, by clicking on the Text To Speech tab of the Speech Properties dialog box, you can choose either a male or female computer voice (Figure F). You can also change the speed at which the computer speaks by adjusting the Voice Speed slide button.

Figure F
Clicking on the Preview Voice button allows you to sample each voice. The default voices available all remind me of HAL from the movie 2001 Space Odyssey.

Another configuration option is located by clicking on the Settings button from the Speech Recognition tab. Here, you can choose how quickly or slowly TTS will respond to your voice input (Figure G). Your two choices are Pronunciation Sensitivity and Accuracy vs. Recognition Response Time. I found leaving these options at their default settings acceptable for most situations. However, you might want to lower the Accuracy vs. Recognition Response Time indicator if your PC is moving too slowly.

Figure G
Only very rarely would you leave the Background Adaptation check box unchecked.

In this Daily Feature, I have shown you the process of setting up TTS for Windows XP. TTS, by no means a new technology, is rapidly approaching broader acceptance as it becomes more reliable and cost-effective. While I would not advise getting rid of your mouse and keyboard just yet, it does appear that they are losing some ground as the preferred method of input.

Who knows where this technology could lead? One day, I am sure there will be a program that handles two-way conversations. Will this just further the trend toward complete isolation from human interaction? Probably so, but personally, I say go for it. So what if the listener on the other end is not human? Who knows, it might have something interesting to say. It could even be an improvement over many of the human conversations we have to endure throughout the day.

