Eyes on the Road or Mind on the Road?
Published: April 2, 2012
Far too often, perceptions of what is cool and useful drive interactive design trends. We use our gut instinct and intuitive sense to identify design solutions. But whenever I think about this, I remember what a college professor taught my class about a complex concept in aerodynamics. As a few of us nodded our understanding, he proclaimed, “If you think that makes sense, you are wrong. It’s counterintuitive, so just make sure to memorize it the right way.”
Our intuition is not always correct, and not all systems bear internal analysis. User experience and interaction design are structured, evidence-based practices. We should not just trust our gut, but always seek to understand how things really work with real people—and why.
I was reminded of this just the other day. There’s a new application for Android that can simulate a transparent display, making the world beyond your phone the backdrop, always in motion. What struck me—aside from the fact that its Augmented Reality-like technology apparently is not unique—was that the reviews discuss how it’s a safety feature.
“...a new Android app that makes your screen, well, transparent. As a result, you can use most functions of your smartphone while being aware of objects and other people in front of you.”
Naturally, no one actually says this now means you can text in your car safely, but they nevertheless seem to make the argument: as long as your eyes can stay on the road and you keep both hands on the wheel, everything will be fine. Well, it’s just not true. And, this reality is not limited only to mobile calling or texting while driving. In actuality, augmented reality, head-up displays, telepresence, and interactions in distracting contexts all have similar perceptual and behavioral benefits and pitfalls.
This fake transparent screen provides another hint that we’re moving more and more toward truly ambient computing, and new user interfaces like actual transparent displays, ubiquitous gesture sensing, and other more unusual and unpredictable user interfaces. If we want these kinds of user experiences both to be safe and to make us more productive—instead of banishing computers from everywhere, not just voice and text from our cars—these systems need to work within all environments—and, in fact, work with these environments. This is not an entirely new frontier, though. Similar devices are already in use in government and industry—and more are now being tested in laboratories. Some specialized devices have been in use for decades and are in their third or fourth generation, and they have taught us many lessons.
So, where do we find information about how these devices work? Well, if you read articles about user experience, you presumably live and work in a design community, so start by asking your coworkers, friends in other design organizations, and anyone you know who actually uses such devices.
A Story About Getting Things Totally Wrong
I started really thinking about this topic—that is, about writing up the benefits and pitfalls of such devices—at dinner a couple of months ago. I was talking with some people from an interactive agency about the design challenges and the difficulties of understanding context. Knowing they had done some government work, I steered the conversation toward specific interactions and learned about a test they had done with a telepresent control platform they were working on.
The agency guys did most of their design work at their office—often surmising how end-user soldiers might work with it—but periodically they got permission to go to a nearby fort and observe first-time use, demonstrate features, and more or less perform an ethnographic study of the product. I say “more or less” because things kept going so badly that they would have to intervene. There were tasks the soldiers had to do besides robot driving, so the designers couldn’t just let the operators mess with the equipment forever. Plus, when guns and big robots that could theoretically squash you were involved, there were safety issues with just letting the soldiers freely figure it out on their own.
One intervention they recalled was when they simply gave the eyepiece and controller to a soldier, with minimal instruction, and told him to turn the robot to face the other way, so they could go down the road. He turned his body around and was surprised that this didn’t do anything. Instead, you have to press buttons on a hand controller to turn the robot, so they had to tell him this after a short delay. There were a number of issues with task failures and response times—even once the operators had become familiar with the system. This was consistently frustrating to the designers, since they could not understand why this was happening. The soldiers finally invited them to try it themselves, so one of them put on some loaner armor and a helmet and mounted the eyepiece and associated gear, then followed the team through the exercise.
He described this experience to me, saying, “Clearing rooms is hard.” They had completely failed to understand how many tasks the soldiers were routinely undertaking at once, how visual the tasks were, and how much of a cognitive load they were under. They immediately realized that displaying vehicle information over the unrelated surroundings didn’t make a bit of sense to users. When they went back to their office, they kept these lessons in mind, and redesigned the equipment to be less obtrusive and capable of being dismissed, so user could focus on real-world tasks.
These designers realized that they had been applying design solutions from other, less-challenging contexts, and using their intuition to determine how the context change would influence people’s ability to use the equipment. They finally learned the reality, but only after building code and putting the solution in front of people.
The Mission or the Equipment?
I followed this conversation up with some brief questions to soldiers, Marines, and airmen that I know who operate UAVs (Unmanned Aerial Vehicles), RPVs (Remotely Piloted Vehicles), and drones in the field. Regardless of the device—whether winged, hovering, or wheeled—their control interfaces broke down into two categories—what you might call head-up and head-down.
Pretty much all of these devices require a laptop or tablet PC of some sort as the control unit, which, at least for now, soldiers have to carry around with them. The older, less-sexy devices tend to use that PC as the control unit as well—or at least rely on it for the display of information, even when a soldier uses a wand or other controller to give commands. This PC is, ideally, mounted to the operator’s armor, so they can hinge it down and use it as a sort of table that’s attached to them. There is no need to sit down to use it, but when using it, the operator is head-down, looking at nothing but the screen.
The other type of device includes some flavor of a see-through, head-up display that is attached to the operator’s helmet, glasses, or goggles. Display symbology—or even images and video—display in the operator’s peripheral vision, and they are designed to enable the operator to see right past them. The operator can then do his other job at the same time—for example, walking down a narrow trail at night without falling down.
Clearly, the head-up version is better, right? Sort of. Basically, it’s a wash. “In a tactical environment, outside any COP (Combat Outpost) or FOB (Forward Operations Base), the user must be protected due to tunnel vision.” The operators were assigned a body guard, whose job was to make sure they didn’t fail to notice a threat and remind them to stow the computer or switch their gaze to what was most important.
The best device they used was the head-mounted HUD (Head-Up Display) that was part of the Land Warrior package. Not because it was in front of the operator’s eye per se, but because it was hinged. This meant operators could easily flip it up and down. When up, it is turned off and out of the operator’s line of vision, so no longer a distraction. Flipping it down, instantly turns it on and displays relevant information on position, radios, and so on. “I would take a knee, view the information and then be back in the fight within a few seconds.”
As the designers whose story I told earlier had found, all of these devices probably look great on paper, work well in a cubicle, or even work well in a demonstration. But once operators get into the field and start doing multiple activities, the limitations of multitasking start to become obvious. Even users who are not trained in design and analysis can figure this out pretty easily after using poorly designed systems, and they’ll start to reject the Buck Rogers solution. One combat leader summarized this: ”A key lesson learned through the use of all of this equipment is that the mission must drive the development of equipment, not the other way around.”
The World Is Complex Enough
Another resource all too many designers forget about is research. Not your own, but serious, scientific research. Unlike the research we do for industry, which is all too often secret, there is an ocean of published research on pretty much any topic about interactivity. This research is useful because it doesn’t just tell us what to do and avoid doing as designers. Most research in interactivity is based on cognitive psychology, social sciences, or physiology, so it tells us why users behave as they do or use information systems and interactive technologies in certain ways.
Remember when I said that the operator of the ground robot turned his body to turn the device? That is because people can really work in only one frame of reference at a time. Even the deliberately rudimentary telepresence he encountered was convincing enough to his brain that it was reasonable for him to assume this was the way to control it. And before you think that you would be smart enough not to do that, go find a game console and have someone film you playing a driving game. You will notice yourself leaning into turns and ducking when a tree branch flashes by. You cannot avoid doing this even with conscious effort.
These are both the pitfalls and benefits of telepresence and immersive display technology. Since these responses are baked into the way we perceive and process information, it’s not possible to design around or fix them. Even mitigation can be a fool’s errand. You have to understand and embrace the innate way in which users work.
Augmented reality and head-up displays work well when they relate to the environment a user is viewing. Displays corresponding to a user’s environment work well; those that violate this relationship are confusing.
Here are some guidelines for designing augmented reality and head-up displays:
- Keep it contextual. The information a head-up display presents—whether in the user’s current or a telepresent environment—must relate to the environment the user is viewing. If it’s about something else, the user has to shift focus—both visually and cognitively—to understand it.
- Consider the gestalt. There are significant reductions in a user’s speed of comprehension when the information is not related to the background—or when the background becomes too complex or moves rapidly. Even mature technologies like helicopter head-mounted displays require training and familiarization. Even trained operators have been observed cheating—that is, closing an eye in certain situations to reduce the amount of visual input. Consider the entire view coming into a user’s brain—both the entire visual scene and all other input such as audio cues or messages.
- Text is hard. Reading text always requires some interpretation. Overlaying text on a display always induces some extra difficulty in comprehension, so you should avoid displaying text when possible and minimize the amount of text whenever text is present. Display temperatures, speeds, or directions, but avoid displaying email messages or dialog boxes.
- Visuals are easier. Overlaying an enhanced view of the world to communicate additional information is generally easy for users to understand. For example, a safe stopping distance that appears on the display as a bar overlaid on the road—as though painted on it—is so easy to understand that it can provide input for split-second decisions like how hard it would be to stop in an emergency.
- Gaze and focus become key. A user has to shift gaze, focus, and cognition from the scene to interpreting letters and numbers. We do this all the time, even on desktop computers, but when the user interface is a distraction, it becomes a critical problem. When focusing on data, users are not paying attention to the outside world, and this is not just about the time their eyes look away.
The world is complex enough. The value of augmented reality and other related systems is in simplifying the user’s understanding of the information flow. If your system instead adds to its complexity, try to find a better solution.
Cognitive Loads and Understanding Safety Guidelines
Another valuable discussion that comes up when considering the use of interactive devices in contextually challenging environments is safety—such as the use of mobile devices while driving. Recently, the US National Highway Traffic Safety Administration (NHTSA) proposed new guidelines for reducing distractions in vehicles. I applaud this effort in part: it no longer focuses on radio waves or whether devices are hand held, but instead recognizes that devices and user interfaces in cars are proliferating, that all of them can be distracting, and that we must consider them as an ecosystem in which a user resides.
However, the guidelines assume that the best measure of distraction relates to quite strict limits on the amount of time a driver’s eyes are off the road, and they claim that cognitive distraction is both hard to measure and probably not an issue. They say, “Unfortunately, we do not, at this time, have the ability to measure the cognitive load generated by reading. However, it seems reasonable that the cognitive distraction generated would vary depending upon what is being read. NHTSA believes that what are most commonly being read by drivers are signs or simple printed material that are not expected to generate high cognitive distraction.”
This seems to be entirely at odds with research on cognitive load. Besides the fact that there are several good ways of measuring cognitive load and some decent models for predicting it—though it does vary from individual to individual—some misunderstandings seem to be spreading—for example, the assumption that all distractions are equally bad. Instead, users have a bucket of cognitive potential. Adding to this has no real consequences until it gets full.
But once users reach their cognitive overload level, adding another task is a bad thing, no matter how small the task:
- Error rates dramatically increase. Tasks that are simple become hard to accomplish.
- Tasks get dropped. Instead of the commonly accepted model of users mentally juggling multiple tasks, users simply stop paying attention to some tasks.
- Frustration increases. Users consciously stop performing a task or pick an arbitrary answer to avoid further interactions.
- Understanding decreases. Users accept information without evaluating its quality. Contrary to the intuitive sense again, people do not increase the complexity of their analysis strategy and instead accept information at face value.
- Knee-jerk reactions occur. Users employ stereotypical or historically satisfactory responses and often disregard new information that proves the situation to be different.
All of the tasks at hand suffer from these negative effects, not just the newest task. In fact, people are often unaware of the tasks for which their performance drops or that they stop doing, much less consciously making a decision about them. An interactive task on a touchscreen might require so much effort that the tasks a user is disregarding are all about walking down the street or driving a car.
Cognitive load relates directly to a user’s comprehension and ease of understanding. For example, there is much evidence that graphical displays induce less cognitive strain—even when data they display does not relate to an overlaid display or to working within a particular environment. Ideally, the NHTSA should have written its guidelines this way, without focusing on the amount of text a user is reading, but encouraging the use of graphics and items that relate to the environment whenever possible.
But at least the research does exist elsewhere, so you can make up your own mind and use analytical and research methods to find out what is really distracting or dangerous about your user interface.
As you can see, keeping a user’s eyes on the road is not the overriding issue government standards make it out to be. You have to work to keep all of the pressures of cognitive loading in mind, and do what you can to keep the user’s mind on the road as well.
While it’s still a little early to settle on design patterns for augmented-reality and consumer head-up display solutions, we’re well into understanding key guidelines and principles.
How can you tell whether an HUD or AR application is right for your situation? Consider such a solution when:
- The scene is not complex and has little or no lateral movement.
- Focusing on or understanding the environment is key.
- Add-on information that is not available in the natural world is nonetheless fundamentally related to the environment.
- Pointing or changes in the direction of the view can easily and instantly change the items displayed.
How can you design transparent or AR applications?
- Make sure all of the elements that you display relate to a user’s environment.
- Ensure that users can understand that the displayed items relate to the scene or environment.
- Display items that relate to the dimensionality of the scene, whether in a virtual or real world.
- Use graphical displays rather than text whenever possible.
- Display elements that users can easily comprehend at a glance—not just view—in under two tenths of a second.
What shouldn’t you do when designing transparent or AR displays? Do not:
- Display large amounts of text
- Present information that is unrelated to the scene or environment.
- Display interface elements that easily get lost in complex scenes or against common backgrounds
- Require users to do math or otherwise interpret data.
- Display any other items that induce high cognitive loads.
- Display items that it is difficult for users to fixate on—for example, because they have odd shapes or poor contrast.
- Display items exclusively to one eye—or so far to one side that they present dominance issues.
Here is a very small selection of useful research reports that provide nice overviews of the design problems that I’ve discussed in this article. If you are a member of ACM, you can read these articles in their entirety in the ACM Digital Library. There are many, many others and entire books that cover these topics as well.
Albers, Michael J. “Tapping as a Measure of Cognitive Load and Website Usability.” In Proceedings of the 29th ACM International Conference on Design of Communication (SIGDOC ’11), 2011.
Bach, Kenneth Majlund, Mads Gregers Jæger, Mikael B. Skov, and Nils Gram Thomassen. “Interacting with In-Vehicle Systems: Understanding, Measuring, and Evaluating Attention.” In Proceedings of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology (BCS-HCI ’09), 2009.
Laramee, Robert S., and Colin Ware. “Rivalry and Interference with a Head-Mounted Display.” ACM Transactions on Computer-Human Interaction, September 2002.
Tonnis, Marcus, Christian Lange, and Gudrun Klinker. “Visual Longitudinal and Lateral Driving Assistance in the Head-Up Display of Cars.” In Proceedings of the 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR ’07), 2007.