Date Added: Jan 2010
When a speaker leaves a voicemail message there are prosodic cues that emphasize the important points in the message, in addition to lexical content. In this paper comparison and visualization on the relative contribution of these two types of features within a voicemail summarization system has been done. It describes the system's ability to generate summaries of two test sets, having trained and validated using 700 messages from the IBM (International Business Machine) Voicemail corpus. Results measuring the quality of summary artifacts show that combined lexical and prosodic features are at least as robust as combined lexical features alone across all operating conditions. Speech is a very rich communication medium and recently there have been efforts to find ways of incorporating prosodic cues in order to extend the capabilities of spoken dialogue and audio browsing/retrieval systems. An important aspect of this approach is the combination of prosodic, acoustic and language information to achieve results that are more robust than those of single sources. Humans use prosody to disambiguate similar words, to group words into meaningful phrases, and to mark the importance of words or phrases. The acoustic correlates of prosody are among the cues least affected by noise, so it is likely that human listeners use prosody as a redundant cue to help them correctly recognize speech in noisy environments. Spontaneous and read speech differ in regard to prosodic structure, with the former having shorter prosodic units.