Emotion and Voice User Interfaces

September 22, 2008

When you hear the term voice user interface (VUI), what comes to mind? Most likely, memories of an interactive voice response system (IVR) for customer service arise. IVRs are certainly not going away. For many companies, they remain the foremost contact point with customers. But voice user interfaces are more than just IVRs. In fact, VUIs have tremendous potential for enhancing the experience of any mobile phone user. As the use of mobile devices and applications proliferates internationally, understanding how to integrate, or mash up, graphic user interfaces (GUI) and VUIs is becoming critically important.

Among the considerations for designing a VUI is emotion. An article in Communications of the ACM, “Speech Interfaces from an Evolutionary Perspective,” noted that, traditionally, humans use speech with other humans who are in close proximity. Therefore, when a person speaks to or listens to a voice user interface, the person assumes a certain amount of emotional involvement with the interface. Ignoring this assumption can mean disaster for a VUI. To help bring attention to the issue of emotion in VUI design, this article describes some important factors in VUI design and provides some examples of voice user interfaces.

Sponsor Advertisement
Continue Reading…

Challenge: Overcoming Negative Perceptions of the VUI

VUI designers have a special challenge to overcome—thanks to the common, and often poorly designed, customer service IVR. Many people have had bad experiences with IVRs, such as the following:

  • feeling forced to use an IVR for an issue a live person should more appropriately handle
  • having a speech-recognition IVR—that is supposed to understand users’ speech—not be able to understood or hear what they were saying
  • getting lost in a menu system
  • becoming caught in an endless loop
  • having to listen to marketing messages while trying to use an IVR
  • experiencing problems with the voice of the IVR system, such as its speaking too fast or having an inappropriate tone

People have also used IVRs in the context of performing mundane tasks or resolving inconvenient problems. Therefore, users’ feelings toward these tasks have become associated with the voice user interface. In our testing of IVRs with users, we have rarely seen a user rate a customer service IVR as better than satisfactory. Most people simply are not excited about using IVRs, and many people even hate them.

Although these negative perceptions of VUIs pose a challenge, we are confident designers can overcome them. A mobile VUI is useful for many situations beyond customer service—such as when traveling or doing field work. To get or provide information quickly and easily on the go, it’s often easier to say a command than type using a mobile keypad or use a small GUI. For instance, AT&T Labs has prototyped voice-controlled applications that use AT&T Watson Speech Recognition technology. One recently released application is a voice-enabled Yellow mobile Web site for the iPhone and BlackBerry. Simply by speaking, users can now enter search criteria.

We think VUI options for mobile applications will only increase as the technology continues to improve. In addition to being useful in diverse situations, mobile phone user interfaces include a visual component that greatly reduces the usability drawbacks of voice-only user interfaces—such as getting lost. Finally, to improve user perceptions of VUIs, their designers can pay better heed to emotional factors in the design of any VUI voice, whether for a mobile mashup or a classic IVR.

Figure 1—Voice-enabled mobile Yellow Web site for iPhone

Designing the VUI Voice

For a VUI, the voice is not unlike the colors and images in a GUI. It is a key means of giving the interface a feel, or tone. Many of the recommendations that follow require a good relationship with professional voice talent, or the person recording sentences, words, and sounds for the interface.

  • Test voices with your users. Using the wrong voice for your VUI can be just as problematic as using the wrong colors in a GUI. Moreover, different cultures can have different perceptions of what the right voice is. However, we are not aware of a systematic reference that describes cultural perceptions of voice. It can be challenging to address cultural issues without stereotyping, so we strongly recommend testing different voices with your users.
  • Use dialect carefully. For instance, if your users are from a certain locale or regional area, you can take advantage of local or regional dialect to make the voice less formal and, therefore, less intimidating. However, if your users are from around the nation or around the world, stay away from dialect, because it could be hard for some users to understand.
  • Ensure adequate volume and a pleasing timbre. A voice that is too loud or too soft or is poorly recorded causes usability problems. It also affects the interface’s tone. For example, a voice that is too loud can seem like it is shouting, and a voice that is too soft can seem to lack confidence. Not surprisingly, research such as that by the MIT Media Lab suggests that voice volume correlates to speech persuasiveness.
  • Pay attention to the voice’s pace. Similar to volume, a voice that is too fast or too slow causes usability problems. It also affects the interface’s tone. For instance, a voice that is too fast can seem aggressive or nervous, and a voice that is too slow can seem unintelligent or unconfident.
  • Use the right voice pitch. A voice pitch that is too high can be grating or difficult to hear, and a voice pitch that is too low can be difficult to hear or intimidating.
  • Emphasize tone in the voice script. If the VUI depends on recorded scripts, designers need to specify tone in the scripts. Such specifications help ensure the voice talent reads the scripts appropriately. For instance, a script that involves reading a bill statement should sound fairly neutral, not enthusiastically happy. From past testing, we know American customers do not like overly chirpy voices telling them how much money they owe.
  • Consider whether the voice is a personalization opportunity. GPS devices such as those by Tom Tom let customers change the VUI voice and even to download more voices—everything from a British female to Mr. T. Allowing users to change the voice makes the VUI less intimidating, more familiar, and even fun.

Figure 2—Custom voices are available from Tom Tom
Custom voices on Tom Tom

The Near Future: Analyzing the User’s Voice

Speech-recognition technology is the capability of a machine or program to recognize and carry out voice commands or take dictation. As this technology grows more sophisticated, it will recognize not only what a user says, but also how the user says it, and, consequently, enhance the user experience. Two applications that are becoming more common include

  • voice authentication—No more must callers authenticate themselves by reciting lengthy account numbers and hard-to-remember pin codes. With voice authentication, users can simply say a few words into the phone. Voice authentication works by capturing the physical characteristics of the vocal tract and comparing it to a stored voiceprint on the system.
  • emotion-based assistance—By analyzing the user’s tone, volume of speech, and choice of words, an IVR system can determine whether a user is in an extremely emotional state and respond accordingly. For instance, an IVR system could identify when a user is getting angry and immediately forward the user to a live customer service representative.


The future for VUIs, both in mobile mashups and IVRs, is bright. As speech technology blossoms, emotion will play a more significant role in VUI design. Already, we can better attune the VUI voice to a user’s emotions. Very soon, we’ll be able to analyze a user’s voice for emotional state, then have the VUI react accordingly. UX professionals can design VUIs to address emotion effectively and, thus, make the most of the voice user experience. 

Partner and Technology and UX Lead at threebrick

Atlanta, Georgia, USA

Darnell ClaytonFor over 10 years, working with companies such as AT&T, Cingular Wireless, IBM, HP, and EzGov, Darnell has designed and developed innovative user experiences for mobile phones, interactive voice response systems (IVRs), and the Web. He is currently focused on developing CCXML, CallXML, and VoiceXML software solutions for small businesses.  Read More

Founder and Principal at Content Science

Atlanta, Georgia, USA

Colleen JonesAn enthusiastic pioneer of content strategy and user experience, Colleen has led and supported strategic initiatives for large global brands such as Philips, InterContinental Hotels Group, and The Home Depot. She has a wealth of content strategy experience, having held key leadership positions at threebrick, which she co-founded; Spunlogic, now Engauge Digital; the Centers for Disease Control & Prevention (CDC); and Cingular Wireless, now AT&T. Colleen holds a B.A. in English and Technical Writing and an M.A. in Technical Communication from James Madison University. A participant in the landmark Content Strategy Consortium at IA Summit 2009, Colleen is very active in the Atlanta UX and content strategy communities, as well as a notable author on content strategy and user experience. She is the author of the forthcoming book Clout: The Art and Science of Influential Web Content.  Read More

Other Columns by Colleen Jones

Other Articles on UX Design

New on UXmatters