From GUI to E(motional) UI
Published: September 11, 2006
In 1988, Apple Computer produced some video scenarios showing how future computers would be able to understand hand gestures, read text, and respond to voice commands. Almost 20 years later, the world is still waiting for a natural way of using computers—though we are beginning to see some of our wildest dreams slowly emerge from the chaos of high technology and become real. In 2006, it is easy to believe that the masses will soon be able to use a computer without any keyboard or mouse. Beyond the constrained space of our personal computer’s monitor, keyboard, and mouse, I’m looking for the sort of revolution that would overtake the wild dream of Blade Runner. I can envision huge 3D virtual worlds and systems that are smart enough to feel a user’s mood and respond intelligently. Now, where do you want to go today?
The Evolution of an Idea
For three years, from 2003 to 2006, my work focused on eyetracking studies. During that time, my team and I discovered a really important innovation in interaction design. We created user interfaces based on eyetracking that could radically transform people’s everyday use of personal computers. The computer seems magical, because it can understand what you really want to do. Personal computers that have eye-driven user interfaces can predict what you want to do next, because your attention and the focus of your vision shift so quickly—almost before you are aware of it. These computers don’t use AI or smart algorithms. They just provide a way for people to communicate with them directly.
Throughout 2005, I encountered some amazing new technologies that could potentially change our usual way of interacting with computers. One such technology is Jeff Han’s multi-touch sensing system, which lets one or more people interact with a computer by touching a large screen that supports many new modes of interaction. Users can touch, drag, or select any object on the screen. While there’s nothing particularly innovative about such interactions in themselves, this system works really well. Han’s multi-touch sensing system represents the state of the art in tangible user interfaces.
Some months ago, Nintendo introduced another fascinating way of interacting with computers: the perfect way to play a game freely, using a virtual lightsaber! While this device is strongly related to gaming, it shows the potential for radically new input devices.
Early in 2006, I became involved with a new product development team. We considered how we could bring together all of these new input devices and, more importantly, how users could transmit emotions like stress or calmness to machines. In a few months, we discovered that a lot of applicable technologies already existed, but were on the back burner. For example, there is a voice-based mood recognition system that can understand the emotional status of a user. While we could combine current technologies that have evolved independently of one another in several different ways, we decided to design a CRM (Customer Relationship Management) decision-tree that lets a computer choose what to say to a user considering his or her mood.
After this first experience, we became aware that software based on a decision-tree could not be sufficiently powerful to sustain credibility over the long run. On the other hand, by using an artificial intelligence framework and restricting the computer’s knowledge to a specific topic or field, we were able to enhance the ability of our system to give the right answers, depending on a user’s mood.
Now, we are working to mash up an application that uses open source APIs (Application Program Interfaces) for general purpose Web sites, but enriches standard XML by adding an emotional layer to the user experience. In this way, we have created a computer system that has the ability to provide information with emotional connotations to users—even if this information comes from an external source. For example, we have created a virtual personal assistant that can more or less emulate the mood of a user. For example, depending on a user’s specific mood, the assistant may seem calm or hurried.
I like to think about this new phase we’re entering as one of technological biodiversity—as opposed to the GUI (Graphic User Interface) homogeneity we have experienced so far.
The Future of Interactivity
Likely, in the future, human/computer interaction will comprise multimodal inputs rather than just keystrokes and mouse clicks. Of course, such a change requires that user interfaces evolve from windowing systems to multifaceted environments in which emotions play a critical role—emotions both as inputs—allowing computers to understand a little bit more about us—and outputs. Such computers should appear intelligent and be much more forgiving of human error than windowing interfaces.
For us, the key to making this future happen was our decision to build anthropomorphous software, using existing technology. Anthropomorphous software has the following capabilities:
- understands unformatted text
- understands human speech, by means of voice recognition technology
- accepts other input from touch screens, eyetracking devices, DTMF (Dual-Tone Multi-Frequency) signals, etcetera
- reads text, using vocal synthesis technology
- writes text and delivers documents that are stored in a knowledge base system
- performs actions like sending email messages, sending SMS (Short Message Service) text messages, and prints receipts
- interacts with users, emulating the personalities, emotional responses, and behavior of human beings
- dialogues with users in a natural way with the end-goal of understanding their needs, by means expert-systems technology
Simulating Human Behavior
With the decision to use anthropomorphous software as our starting point, we could design a new kind of computer that communicates with users via enhanced natural output and input devices and apparently has some social skills, as shown in Figure 1.
Figure 1—Leandro with his virtual assistants
This computer comprises the following modules:
- Input Module—Collects all voice, text, touch-screen, and other inputs and analyzes their semantic meaning.
- Representation Module—Analyzes the flow and context of the dialogue through which the inputs are generated.
- Knowledge Module—Represents internal and external knowledge bases.
- Emotions Module—Represents a range of emotions, defines their programmatic equivalents, and includes the program functions that map the inputs and the representations to emotions.
- Behavior Module—Describes the personality that is unique to each virtual assistant and filters the emotions through that personality.
- Output Module—Shows the result of emotional output computation and does related rendering.
A virtual assistant that has a lifelike human face can enhance interactive applications by providing straightforward feedback to users and stimulating emotional responses in users. Figures 2 through 6 show how lifelike our virtual assistants are.
Figure 2—Virtual banking
Figure 3—The perfect banking assistant
Figure 4—A virtual ticketing service
Figure 5—Our friendly ticketing assistant
Figure 6—An assistant with lifelife facial expressions
Today, a typical PC can do 3D rendering in real time and play MPEG-4 movies. Either of these standard approaches lets us create and animate virtual characters. I recently discovered that the MPEG-4 multimedia standard includes a Facial and Body Animation (FBA) object that describes the geometry of a virtual character and animates it.
The FBA specification defines FDPs (Face Definition Parameters) and BDPs (Body Definition Parameters). Using these parameters, a decoder can create an FBA model that has the specified shape and texture. The MPEG-4 standard also offers the possibility of defining an expression. A hypothetical 3D engine based on this standard could represent emotions and apply them to the animation of any face.
This example—showing how we could use MPEG-4 to animate virtual characters—demonstrates that there are no technical constraints stopping our evolution from GUI to EUI (Emotional User Interface). Around the world, other companies and academic research centers are already creating emotional models and describing them as XML files—in this case, eXML (Emotional XML). Plus, 3D engines exist that can transform this eXML, using a simple script in a 3D movie file. All of this shows that we will very soon be able to use emotions in human/computer interactions!
Ready for EUI?
What do we really want to do with our desktop PCs that are 100,000 times faster than an old Apple I? What should we do with our 1GB mobile phones and our super-fast memory and broadband communications? From my point of view, EUI is the answer! I truly want to realize the Knowledge Navigator dream. I want devices that recognize and respond to eye gaze, gesture, and voice inputs—not mechanically, but taking human behaviors and emotions into account.
If you think virtual assistants (VAs ) are not intelligent enough, you’re probably right. I’m conscious that a computer is just a computer. I also know that I don’t need a personal assistant who is a genius. However, I do need a computer that doesn’t drive me crazy by making stupid requests every five minutes. As it is, sometimes when I click a button in a message box, I feel that I’m working for the computer!
Currently, communications between humans and computers are very cold. What I want is fewer clicks and more feeling. So, I hope computer systems will evolve toward EUIs that are more enjoyable to use than our current windowing environments. By pursuing this EUI approach, we can create computers that can learn from humanity. In my view, because of the capabilities that eXML and MPEG-4 FBA provide, we are at that moment in human/computer interaction at which this is now possible.
We are now entering a new era of computing, and computers that emulate human behavior are looking smarter every day. EUIs that offer near-human interaction await us!