From GUI to E(motional) UI

September 11, 2006

In 1988, Apple Computer produced some video scenarios showing how future computers would be able to understand hand gestures, read text, and respond to voice commands. Almost 20 years later, the world is still waiting for a natural way of using computers—though we are beginning to see some of our wildest dreams slowly emerge from the chaos of high technology and become real. In 2006, it is easy to believe that the masses will soon be able to use a computer without any keyboard or mouse. Beyond the constrained space of our personal computer’s monitor, keyboard, and mouse, I’m looking for the sort of revolution that would overtake the wild dream of Blade Runner. I can envision huge 3D virtual worlds and systems that are smart enough to feel a user’s mood and respond intelligently. Now, where do you want to go today?

Champion Advertisement

The Evolution of an Idea

For three years, from 2003 to 2006, my work focused on eyetracking studies. During that time, my team and I discovered a really important innovation in interaction design. We created user interfaces based on eyetracking that could radically transform people’s everyday use of personal computers. The computer seems magical, because it can understand what you really want to do. Personal computers that have eye-driven user interfaces can predict what you want to do next, because your attention and the focus of your vision shift so quickly—almost before you are aware of it. These computers don’t use AI or smart algorithms. They just provide a way for people to communicate with them directly.

Throughout 2005, I encountered some amazing new technologies that could potentially change our usual way of interacting with computers. One such technology is Jeff Han’s multi-touch sensing system, which lets one or more people interact with a computer by touching a large screen that supports many new modes of interaction. Users can touch, drag, or select any object on the screen. While there’s nothing particularly innovative about such interactions in themselves, this system works really well. Han’s multi-touch sensing system represents the state of the art in tangible user interfaces.

Some months ago, Nintendo introduced another fascinating way of interacting with computers: the perfect way to play a game freely, using a virtual lightsaber! While this device is strongly related to gaming, it shows the potential for radically new input devices.

Early in 2006, I became involved with a new product development team. We considered how we could bring together all of these new input devices and, more importantly, how users could transmit emotions like stress or calmness to machines. In a few months, we discovered that a lot of applicable technologies already existed, but were on the back burner. For example, there is a voice-based mood recognition system that can understand the emotional status of a user. While we could combine current technologies that have evolved independently of one another in several different ways, we decided to design a CRM (Customer Relationship Management) decision-tree that lets a computer choose what to say to a user considering his or her mood.

After this first experience, we became aware that software based on a decision-tree could not be sufficiently powerful to sustain credibility over the long run. On the other hand, by using an artificial intelligence framework and restricting the computer’s knowledge to a specific topic or field, we were able to enhance the ability of our system to give the right answers, depending on a user’s mood.

Now, we are working to mash up an application that uses open source APIs (Application Program Interfaces) for general purpose Web sites, but enriches standard XML by adding an emotional layer to the user experience. In this way, we have created a computer system that has the ability to provide information with emotional connotations to users—even if this information comes from an external source. For example, we have created a virtual personal assistant that can more or less emulate the mood of a user. For example, depending on a user’s specific mood, the assistant may seem calm or hurried.

I like to think about this new phase we’re entering as one of technological biodiversity—as opposed to the GUI (Graphic User Interface) homogeneity we have experienced so far.

The Future of Interactivity

Likely, in the future, human/computer interaction will comprise multimodal inputs rather than just keystrokes and mouse clicks. Of course, such a change requires that user interfaces evolve from windowing systems to multifaceted environments in which emotions play a critical role—emotions both as inputs—allowing computers to understand a little bit more about us—and outputs. Such computers should appear intelligent and be much more forgiving of human error than windowing interfaces.

For us, the key to making this future happen was our decision to build anthropomorphous software, using existing technology. Anthropomorphous software has the following capabilities:

understands unformatted text
understands human speech, by means of voice recognition technology
accepts other input from touch screens, eyetracking devices, DTMF (Dual-Tone Multi-Frequency) signals, etcetera
reads text, using vocal synthesis technology
writes text and delivers documents that are stored in a knowledge base system
performs actions like sending email messages, sending SMS (Short Message Service) text messages, and prints receipts
interacts with users, emulating the personalities, emotional responses, and behavior of human beings
dialogues with users in a natural way with the end-goal of understanding their needs, by means expert-systems technology

Simulating Human Behavior

With the decision to use anthropomorphous software as our starting point, we could design a new kind of computer that communicates with users via enhanced natural output and input devices and apparently has some social skills, as shown in Figure 1.

Figure 1—Leandro with his virtual assistants

This computer comprises the following modules:

Input Module—Collects all voice, text, touch-screen, and other inputs and analyzes their semantic meaning.
Representation Module—Analyzes the flow and context of the dialogue through which the inputs are generated.
Knowledge Module—Represents internal and external knowledge bases.
Emotions Module—Represents a range of emotions, defines their programmatic equivalents, and includes the program functions that map the inputs and the representations to emotions.
Behavior Module—Describes the personality that is unique to each virtual assistant and filters the emotions through that personality.
Output Module—Shows the result of emotional output computation and does related rendering.

A virtual assistant that has a lifelike human face can enhance interactive applications by providing straightforward feedback to users and stimulating emotional responses in users. Figures 2 through 6 show how lifelike our virtual assistants are.

Banking assistant — Figure 3—The perfect banking assistant

Virtual ticketing service — Figure 4—A virtual ticketing service

Ticketing assistant — Figure 5—Our friendly ticketing assistant

Assistant with lifelife facial expressions — Figure 6—An assistant with lifelife facial expressions

Today, a typical PC can do 3D rendering in real time and play MPEG-4 movies. Either of these standard approaches lets us create and animate virtual characters. I recently discovered that the MPEG-4 multimedia standard includes a Facial and Body Animation (FBA) object that describes the geometry of a virtual character and animates it.

The FBA specification defines FDPs (Face Definition Parameters) and BDPs (Body Definition Parameters). Using these parameters, a decoder can create an FBA model that has the specified shape and texture. The MPEG-4 standard also offers the possibility of defining an expression. A hypothetical 3D engine based on this standard could represent emotions and apply them to the animation of any face.

This example—showing how we could use MPEG-4 to animate virtual characters—demonstrates that there are no technical constraints stopping our evolution from GUI to EUI (Emotional User Interface). Around the world, other companies and academic research centers are already creating emotional models and describing them as XML files—in this case, eXML (Emotional XML). Plus, 3D engines exist that can transform this eXML, using a simple script in a 3D movie file. All of this shows that we will very soon be able to use emotions in human/computer interactions!

Ready for EUI?

What do we really want to do with our desktop PCs that are 100,000 times faster than an old Apple I? What should we do with our 1GB mobile phones and our super-fast memory and broadband communications? From my point of view, EUI is the answer! I truly want to realize the Knowledge Navigator dream. I want devices that recognize and respond to eye gaze, gesture, and voice inputs—not mechanically, but taking human behaviors and emotions into account.

If you think virtual assistants (VAs ) are not intelligent enough, you’re probably right. I’m conscious that a computer is just a computer. I also know that I don’t need a personal assistant who is a genius. However, I do need a computer that doesn’t drive me crazy by making stupid requests every five minutes. As it is, sometimes when I click a button in a message box, I feel that I’m working for the computer!

Currently, communications between humans and computers are very cold. What I want is fewer clicks and more feeling. So, I hope computer systems will evolve toward EUIs that are more enjoyable to use than our current windowing environments. By pursuing this EUI approach, we can create computers that can learn from humanity. In my view, because of the capabilities that eXML and MPEG-4 FBA provide, we are at that moment in human/computer interaction at which this is now possible.

We are now entering a new era of computing, and computers that emulate human behavior are looking smarter every day. EUIs that offer near-human interaction await us!

Links to videos that demonstrate some EUIs we’ve created:

In Software User Experiences | UX Design

6 Comments

Michael Zuschlag

September 26, 2006 11:50 PM

How ironic that we think we can get more exact results from our computers by emulating human interaction, but when we want exact results from human interaction, we unintentionally emulate computers. Engineering, air traffic control, legal contracts—in all endeavors where precise communication is critical—our success has depended on washing out human emotion and natural language in favor of formal procedures and protocols, complete with a detailed domain-specific language.

leeander

September 27, 2006 3:49 PM

Humans are the best form of intelligence on this planet. However, as humans, we’ve evolved ourselves mostly though thousands of years of imprecise communications enriched by emotions.

We need to find the right mix of logic plus emotions, so emotions will emerge in the inputs and the outputs of machines with different weights.

Today in HCI, emotion’s two-way channel is poor going from humans to machines (poor inputs) and rich going from machines to humans. Still, working on this great output, we think we can balance this relationship.

However, air traffic control is a very critical task. On one hand, I know that it is possible to do better than existing systems, but in the meantime, I’m happy to be safer, even if someone is sacrificing his human emotion and natural language.

Amir D

February 10, 2007 1:07 PM

While reading this interesting article, I couldn’t help but wonder how effective virtual characters are. Presenting virtual characters to users is not a new idea, of course, and almost all the research I’ve read on the subject seems to suggest that people just don’t buy into it and don’t trust this thing, not to mention the fact they feel they are being patronized by these animations.

Personally, I can’t even begin to imagine interacting with a virtual banking assistant, simply because I won’t trust it as much as I would trust a simple and usable HTML form. The fact there’s something so artificial about the situation worries me as a user, and I don’t think that this hyper-reality scenario is probably the best solution.

I share your views regarding seamless interaction between humans and computers, but we must not forget that humans are also extremely suspicious, especially when it comes to technology.

Amir

ana

February 27, 2008 6:28 AM

Cool way of developing a new customer base. I would love to talk to this virtual assistant. (It’s like you are inside the net as well, heehee.)

Kazagistar

June 19, 2008 10:15 PM

When people talk to a human, they expect human-level responses, and when they talk to a computer, they expect computer-level responses. Adding a face to computers will only be truly viable once the back-end intelligence is good enough to act and respond in a human way; until then, a human face only creates frustration amongst subconsciously disillusioned users.

Human-to-computer input needs to be improved, by all means, but keeping it computer-like keeps the expectations rightfully low and makes any hints of intelligence in a system a pleasant surprise.

Shanker Jegan

July 3, 2008 7:56 PM

Trust me, this is where I do research now. Currently, I’m working on a project with an IVR-based user interface, where I spent more than 4 years to complete the results. I’m in the middle of developing the portal for this. Call me at +94786433964 for more details. I would love to join your team, so we may do some wonders!!!!!

Leandro Agrò

Digital Product Director at Design Group Italia

Milan, Italy

Leandro has more than 10 years of experience as an interaction designer and manager of IxD teams. He specializes in next generation user interfaces that provide human-like interaction with intelligent virtual assistants. At Key-One, in Milan, Italy, Leandro is providing detailed specifications for the behaviors of such virtual assistants, including gesture, language, and social skills; defining emotional models that let these assistants respond to the moods of users; and designing visual user interfaces and voice-recognition systems. Before joining Key-One, Leandro was Advanced Design Director at the innovative startup SrLabs, where he focused on eyetracking for the usability market. While there, he led a design team that created the first hands-free, multimodal GUI with voice and gaze input. Previously, he was co-founder and Vice President of a large eConsultancy with offices in Milan and Boston. Leandro studied interaction design at the Domus Academy, in Milan, Italy. He is actively involved with the Milano Bicocca University, TorVergata University, and others. Leandro is a prolific writer on topics from IxD, usability, and UX to natural, multimodal user interfaces. He was the founder of Idearium.org, the first Italian eZine/community for UX designers, and is co-producer of the Interaction Frontiers conference. Read More