· Where are we now with speech recognition over the phone?
· What are people's reactions to using speech applications?
· What's involved in designing a speech UI?
· What have we learned about designing speech UI from users?
· Should a very personal computer use a voice user interface?

For decades, science fiction has imagined a world where people can have human-like conversations with computers. One of the great challenges for us now is understanding how to design the voice user interfaces of these systems.

Language is so natural and varied for different speakers that it takes an extremely well-crafted UI to predict what the caller is going to say next and constrain the caller's responses enough to suit the still limited technology.

All the while, the callers need to come away from the experience with a general liking of the system's style and "voice."

What users think about speech systems and what makes a voice user interface successful is well studied. As is how align the dialog with the caller's mental model, how to create clear and efficient navigation, write natural prompts, and gracefully recover from errors. Many of the design principles can be generalized to all interfaces.

products and books


Some voice user interface notes and links:

  • "With the cost of the Voice User Interface (VUI) accounting for halfexternal link the total cost of ownership for a speech project, it is critical that enterprises evaluating speech better understand the value that a high quality VUI brings. One key hurdle to evaluating investment in the VUI is the subjective criteria that do not lend themselves well to financial analysis. It’s difficult to say how much creation of a persona, or a user-friendly application are really worth. A simple but powerful way to overcome this hurdle is to integrate the concept of Customer Life Time Value (LTV) into a traditional Return on Investment (ROI) analysis. ... More rigorous analysis would factor in drivers such as time to market, financing charges, telecom contract rates, transaction type and metrics, costs of call blocking and abandoned calls, Ehrlang calculations for system sizing, revenue capture opportunities, specific wage rates and a probable estimate of overhead reduction."
  • "Speech application development is evolving to dynamically generatedexternal link VoiceXML?. Now companies can cost-effectively add speech to Web apps and not sacrifice the quality of the resulting Voice User Interface. Reusable Dialog Components, a component framework based on JavaServer Pages, are central to this evolution. Explore this roadmap for driving down the overall cost of creating, deploying, and managing speech solutions. Also, learn how complex speech applications built with today's technologies can interoperate with speech-enabled Web applications for a smooth transition and a seamless user experience."
  • VUI Total Voice; "The speech-technology industry took its first step toward the adoption of a Web programming model by standardizing VoiceXML, Version 2.0. First-generation voice-enabled Web applications were mostly built of static VoiceXML pages" - http://www-306.ibm.com/software/pervasive/ws_vaa/external link
    • WVAA V5 allows users to take advantage of the personalization features of WebSphere Portal to tailor their individual voice portals to fit their needs.
    • For the administrator, it provides a consistent framework for administering users and extending the same portal security features for authentication/authorization across multiple channels.
    • For the developer, WVAA V5 leverages the same Eclipse-based programming model as WebSphere Portal to build applications using VoiceXML technology."
  • Client-side framework for life cycle management through any Eclipse-based IDEexternal link to "create, test, deploy and analyze voice applications, leveraging the full advantages of the proven VoiceObjects Server technology."
    • "Its built-in best practices for Voice User Interface (VUI) design make it easy to achieve an optimal caller experience, while its support of VoiceObjects’ unique layer technology enables the rapid incorporation of dynamic content, multiple languages and personas into a single voice-driven service. VoiceObjects Server provides the reliable and scalable service execution and management platform to seamlessly integrate into existing environments and to successfully operate voice applications."
  • Voxeo IVR scalingexternal link
  • "VoiceNavItexternal link is the most versatile, powerful, and timesaving application provided for PC, PDA, and Smartphone devices. This simple to Use Voice Command and Control application with macro recording and playback allows you to automate your most common tasks with your voice command for the utmost in convenience. Just a simple voice command can retrieve your emails, create new tasks, open your calendar and switch between views, take - save – send a picture, launch your favorite PalmOS application and much, much more. The possibilities are as endless as your imagination."
  • CareDecision MDCommandexternal link
  • "Patent Abstractexternal link: A speech recognition system front end interface to provide a subscriber voice control over many voice mail functions. The speech recognition system converts spoken instructions into DTMF instructions for voice mail systems while allowing prompts form the voice mail system and DTMF tones from the subscriber to pass through without any interference by the speech recognition system. To accomplish these pass through functions, the speech recognition system sets up what is know as a hairpin connection between the subscriber, the speech recognition system and the voice mail system"
  • "Voice Self Serviceexternal link represents the next generation of self service applications which are now simple and convenient enough to be customer friendly. Voice Self Service provides the benefit of instant access with an ‘easy to use’ interface that is close enough to a real life interaction to feel natural. Already there is a broad range of ‘standard’ applications that show where direct agent contact? can be safely removed in favour of self service?."
  • 'http://www.baychi.org/calendar/20011113/|Giving a Voice to User Interfaces], Nicole Leduc and Jennifer Balogh, Nuance Communications, http://www.baychi.org/trackback/64external link presented "VUI as premature technology in the 80's, as DTMF replacement in the 90's in which digits and yes or no responses are understood, as capable of supporting directed dialog with a small vocabulary in '94, as natural directed dialog in '96 with natural directed dialog and a larger vocabulary, as natural voice dialog in which a user is able to say what they mean in '98" and by 2001 "as natural voice interface incorporating natural speech and understanding today. An example of a VUI that is able to provide understanding would be a VUI implementation of a stockbroker."
    • "Enterprise markets are using VUIs for financial, travel, retail and customer contact applications to attain cost reduction, improved customer service and customer retention. Telecom markets are using dialing, messaging and directory lookup to provide differentiation through added services and new revenues. Portal markets are using VUIs for V-Commerce, voice portal services and web contact access to increase usage especially mobile, provide new services and gain additional revenue through V-Commerce. Nicole Leduc then leads into the next section noting that hundreds of speech applications have been deployed and posing these questions:"
    • "two quantitative surveys conducted in 1999 and 2000... measure users satisfaction. The survey includes 500+ recent VUI users or callers who used one of six different applications covering a range of vertical markets. The survey was conducted over the phone by an outside research firm, Evans. The analyses of the survey results were done by Nuance. The survey provided statistics on user population, usage and satisfaction, as well as a baseline to measure company performance... 83% of users were satisfied by using VUIs in 1999 and 87% were satisfied by using VUIs in 2000. Satisfaction is the sum total of users who selected any of the three options: completely satisfied, very satisfied or somewhat satisfied. Nicole Leduc noted a wide range of scores, 60-98%, were observed between applications. Applications that underwent more revision cycles had higher success rates. The survey found the following results when comparing VUIs with speaking to a human, using a DTMF (Dual-Tone Multi-Frequency) system or using the Web. Users were given three options to indicate how a given method compared to VUIs: great improvement, somewhat better or the same, or somewhat worse or much worse. When comparing VUIs to talking to a human 38% found it a great improvement whereas 16% found it somewhat worse or much worse. When comparing VUIs to using a DTMF system 34% found it a great improvement whereas 20% found it somewhat worse or much worse. When comparing VUIs to using the Web 29% found it a great improvement whereas 16% found it somewhat worse or much worse...voice user interfaces can be satisfying if the user experience has been well thought out, the user interface captures the mental model of the user and the experience provides advantages over means of doing the same thing, using other methods."

What's involved in designing a speech UI?


Designing speech interfaces is more difficult than designing for the web, requiring dialog design and application code. Dialog design included call flow, prompts and grammar?s where prompts included wording, persona, voice recording and audio sounds.

Graphical user interfaces and voice user interfaces share the need for information architecture, content, required functionality and frequently share navigational flow. The most noticeable difference between the interfaces is the use of different input mechanisms; voice user interfaces obviously use voice for input whereas graphical user interfaces use keyboard and mouse input mechanisms.

However, voice user interfaces are also ephemeral in duration, users have a linear perception of them, they can be highly evocative, have the potential of being very natural, and can typically be utilized anywhere as minimal equipment (typically only a phone) is generally required.

Since speech recognition is not 100% accurate, speech is ephemeral in nature, insufficient speech interface research is occurring and all users are experts, the job is more complicated.

What have we learned about designing speech UI from users?


Key characteristics: accuracy, minimized cognitive load, clear and efficient, provide graceful recovery from errors, and incorporate natural prompting. Studies show accuracy was a key element in predicting the user satisfaction of various voice applications. Problems include inability for the number 2 to be recognized by an Australian application, Phone numbers given as identification - causing user trust to erode and which lead to additional mistakes and more situations in which users simply gave up.

Designers must minimize cognitive load? on users. To do this, design interfaces that leverage a user's mental model, realize the limits of a user's short-term memory and set the user's context in advance, e.g. a touchtone? application should not use company departments for menu choices, this result leads to low call completion and customer satisfaction rate. A redesign of the system using statistics to identify the types and frequency of customer and define a new call flow can increase call completion by 40%.

Miller's Magic number, 7 +/- 2?, chunks in short-term memory, led to phone numbers being 7 digits long. For verbal menus used in speech interfaces this number is much smaller, keeping menus around 3 choices and no higher than 5 is advisable. Long menus give results in which users either do not speak or speak too late. If the application requires more options than can reasonably be provided via a voice prompt, list the most frequent options and enable users to hear the other options by saying, "help".

To set context for the user, use wizard of oz? studies in which a user simulates an automated system). One for AT&T found that varying length greetings evoked significantly different levels of success by users. Contrary to common perceptions, the shorter greeting generated the lowest level of success despite the fact that it was the most efficient with respect to time. The longer greetings were able to set the context of users; this improved the success rating of the system.

In a food ordering application, failing to inform users that the system was voice-automated, can cause a high hang-up rate or no speech. Tutorial and introductory messages can reduce the number of hang-ups.

Consistency and use of persona can improve a users comfort level: The persona must be well defined, appropriate for the application, have a voice talent suited to the persona and have every prompt written to consistently adhere to the persona. Streamlining prompts to be more conversational and natural are usually more satisfactory and comfortable to users. Most users found the use of "I' in prompts to be more satisfying. This was also true for use of discourse markers such as "okay" and personal pronouns. See first person? and diagnostic dialogue.

Efficiency and clarity in design requires: to define a consistent persona and metaphor, tell callers where they are using landmarks such as audio and error prompts and streamline the flow by embedding shortcuts to the flow. It also helps to rapid reprompts prompts, delayed help, and confirmation and correction of voice input to enable graceful error recovery.

Speech recognition systems can satisfy mass market expectations and that mass market research exists to set initial benchmarks and usability findings also exist for some applications.

Larger organizations use consumer usability? studies, studies?, and analyses? with consumers in their homes, and user aided design processes developing prototypes. The user-centered research & design methods and dialog design? techniques that emphasize error recovery are particularly applicable.




Find Page
If you like openpolitics.ca, support the foundation. Become an Open Politics Insider.
We power your empowerment.