A prominent theme at the 2004 Consumer Electronics Show revolved around the personal data network and the devices that would connect your little corner of personal space to the rest of humanity via wireless networks and the wonder of the Internet. Companies not normally considered part of the consumer electronics market, such as Intel and Microsoft, have come to the realization that there is money to be made from catering to our seemingly insatiable appetite for connectivity.
But interfacing with these devices requires a certain amount of dexterity as you manipulate Chiclet-size keys or a too-thin, easy-to-lose stylus. This is where the potential of speech recognition interfaces lies.
Yes, speech recognition is again a rising cultural meme. While the idea of speech-enabled interfaces has been around since the invention of the computer, technological reality has always been the cold slap in the face that quieted even the most enthusiastic advocates.
Perhaps the latest initiative in the form of Speech Application Language Tags (SALT) technology will be able to overcome at least some of the technical issues that have plagued speech-enabled interfaces for so many years. If nothing else, SALT has the backing of some very large, powerful, and deep-pocketed companies; whether that is enough to make the technology an effective reality for the consumer electronics market is yet to be determined.
According to the SALT Forum, the organizational body that designed the initial specification and is working to develop the actual standard, SALT is defined as:
“…a small set of XML elements, with associated attributes and DOM object properties, events, and methods, which apply a speech interface to Web pages. SALT can be used with HTML, XHTML, and other standards to write speech interfaces for both voice-only (e.g. telephony) and multimodal applications.”
The idea is that SALT will enable users to interact with an application in a variety of ways (the mulitmodal part of the definition) by allowing input with speech, keyboard, mouse, and/or stylus. The tags would provide a mechanism for output of synthesized speech, audio, text, video, and graphics. The form factor envisioned for the SALT technology includes telephones, handhelds, and PCs.
To maintain the broadest interoperability, the draft SALT specification leverages existing standards in HTML, XML, etc., as well as speech standards established by the W3C, such as the Speech Recognition Grammar Specification (SRGS) and the Speech Synthesis Markup Language (SSML).
There are three main top-level elements in SALT:
- <listen>—This is the element that configures the speech recognizer, executes recognitions, and handles speech input events.
- <prompt>—This is the element that configures the speech synthesizer and plays any prompts.
- <dtmf>—This is the element that configures and controls Dual Tone Multi-Frequency (DTMF) in telephony applications.
The <listen> and <dtmf> elements may contain <grammar> and <bind> elements, and the listen element can also hold the <record> element. Listing A shows a <listen> example that holds a remote grammar containing city names, and a bind statement to process the recognition result.
The SALT Specification 1.0 is available for download from the SALT Forum in PDF form. Note it is 112 pages long, so it may take awhile to download.
Besides Microsoft and Intel, the SALT Forum includes prominent members such as Cisco Systems, Philips, Hitachi, InfoTalk Corporation, and Verizon. There are dozens of other major corporations actively participating in the development of the standards and specifications. The number of companies and the breadth of the industries they encompass should give you a good indication of the seriousness and determination of the SALT Forum.
With that kind of deep-pocket backing, it’s a real possibility that the SALT standards will move from mere theory to practice. Assuming that does come to pass, there should be tremendous opportunity for software developers to create applications.
There will be opportunities to create not only the user applications residing on the end device, but also the networking infrastructure that must transmit the multimodal data, and the middleware that will translate that data into transactions on a shopping cart or track entries in a database, for example. Being familiar with the standards and specifications outlined by the SALT Forum could be a smart investment in your future as a software developer.
There seems to be a real trend toward actual applications that take advantage of speech and other non-traditional forms of user input. Is anyone in the Builder community developing speech-enabled applications? Do you think that such applications are still years away or just around the corner? How much is hype and how much is reality? We’d love to hear what you think, so start or join the discussion forum at the end of this article and let us know.