Infusing Usability Testing with Reality

By Michael Hawley

Published: October 18, 2010

“A business sponsor and observer asked me about my line of questioning: ‘What do you expect…?’”

I recently conducted some user research on a proposed experience for a Help and value-added learning center for a Web application. The goals of the study were as follows:

  • Assess how well our proposed designs would align with user needs.
  • Understand how the new branding for the section would impact the user experience.
  • Understand how well a proposed conceptual approach to information categorization would support information seeking.

The setup for this study was similar to that for any typical usability study. We invited people to participate in one-on-one sessions with a moderator and asked participants to complete a series of tasks while using the think-aloud protocol. Project team members, including designers and business sponsors, watched from another room.

We wanted to gain the best possible understanding of the entirety of the proposed user experience, including branded words for labels, information architecture, and categorization. Therefore, during the course of the sessions, I asked participants to describe what they expected to see in a section or on a page behind a link before they clicked it. I thought this would help me to understand the users’ mindsets coming into the experience.

But at the end of one day of testing, a business sponsor and observer asked me about my line of questioning: “What do you expect…?” She thought it was odd that I had asked that question, because that’s not how people think when browsing or using the Web in real life. She also suggested that my asking that question could potentially introduce bias into the study, because participants would stop to really think about what they might expect. Although I was happy with the information we were getting from participants, her comments did make me wonder: should I make the testing more real?

Summative Testing

“For summative testing, … I limit interruptions and deep, probing questions and focus mostly on watching behavior as participants complete tasks. I leverage the think-aloud protocol for simple explanations of that behavior.”

If the goals of the study were different, I would likely agree that interrupting users during tasks might not lead to the best results. For summative testing, where the goal is to identify usability issues that would prevent users from completing tasks or to develop a benchmark for the efficiency of a site, I limit interruptions and deep, probing questions and focus mostly on watching behavior as participants complete tasks. I leverage the think-aloud protocol for simple explanations of that behavior.

However, even though I might minimize probing questions during a task, I don’t strive to make the situation more real from a context perspective. For summative testing, attempting to simulate a real scenario is not that critical. Rather, it is more important to recruit participants who match the profile of the target audience and track how those users accomplish defined tasks.

When developing the scenarios that often accompany tasks in a usability test, moderators sometimes draft elaborate scenarios to help get participants in the right frame of mind for each task. But, if we’ve recruited the right participants and our goal is to find usability problems with a given interaction, we should be able to simply ask participants to complete the task and observe their behavior.

A scenario and the background information participants use in completing a task should be the minimum necessary to get the task done. A mentor of mine once commented on the desired level of reality for this type of summative testing. He noted that we bring participants into an unrealistic environment—a usability lab—have moderators watching over their shoulder, and promise them money for their cooperation. Why would we expect that we can make the situation seem real? Better to simply ask participants to complete a task and observe where there are issues

Formative Testing

“During formative testing, while I am interested in getting participants’ genuine responses to the designs and the questions I ask them, I’m not necessarily looking to make the testing situation real….”

The usability testing I was doing for the learning center project I described earlier was formative testing. By this, I mean usability testing that occurs early in the product design process, whose goal is not primarily to find usability problems that need to be fixed, but to assess the overall user experience and understand users’ reactions to different ideas. Often, this includes comparative testing—that is, soliciting feedback on multiple solutions to a design problem. When done early in the design process, formative testing informs decisions about our design direction and general approach to interaction design and information architecture solutions.

During formative testing, while I am interested in getting participants’ genuine responses to the designs and the questions I ask them, I’m not necessarily looking to make the testing situation real—as the business sponsor on my project suggested. Rather, I am interested in using any technique or line of questioning at my disposal to elicit information about the target audience and gain insights that will inform design decisions.

By asking a question such as “What do you expect to happen when you click…?,” I obtain valuable information about the mindsets of potential users and their reactions to certain labels or design artifacts. Of course, to ensure that my perceptions don’t get skewed by one or two individuals, it’s important that I talk with multiple participants. When it’s done effectively, interrupting participants to ask probing questions or having in-depth discussions with them can lead to insights that allow us to be truly empathetic designers.

A Dose of Reality

“I find it incredibly helpful when other team members observe test sessions, because it gives them an eye-opening experience and establishes a foundation and common ground for the whole project team to build on.”

Although I’ve stated that there is only a limited need to create realistic environments for participants during summative or formative usability testing, I do realize that there is room for a dose of reality.

First, the reaction of our project’s business sponsor was real. I was happy that she had observed the usability test sessions. I find it incredibly helpful when other team members observe test sessions, because it gives them an eye-opening experience and establishes a foundation and common ground for the whole project team to build on. If she were skeptical about my testing methodology, it would have been in my best interest to adjust the protocol so she would buy into the process—even if I had academic justifications for why it was okay to interrupt tasks with probing questions about participants’ expectations.

Second, there is always a danger of introducing bias during any line of questioning. Even the most experienced moderator can never run a perfectly unbiased session. With this in mind, it is always worthwhile to observe participants’ unfiltered, uninterrupted experience for at least some part of a session.

In circumstances where motive and incentive are important, providing participants with a realistic scenario can actually serve to minimize the bias of the artificial environment that exists during a usability test.

Finally, in circumstances where motive and incentive are important, providing participants with a realistic scenario can actually serve to minimize the bias of the artificial environment that exists during a usability test. For example, when studying an ecommerce environment, it’s good practice to incent participants by letting them choose their own merchandise during a shopping experience.

For the learning center project I described earlier—as well as for most applications other than ecommerce applications—the best way I’ve found to introduce realistic situations into a usability study is to allow participants to perform unguided, self-motivated tasks at the outset of a session. Before introducing the set of predefined tasks I want to study, I ask participants to describe several things they would want to do with the product. Then, even if the product’s design or prototype does not support those tasks, I ask them to use the design to achieve those goals.

By introducing reality in this way, I am able to understand participants’ mindsets before they’ve used a design, develop a set of top-priority tasks that can inform decisions about visual hierarchy and information architecture, and get an unbiased understanding of participants’ first reactions to a product’s overall design or a specific interaction. With this initial user experience as a foundation, I can then turn to the specific tasks I want to study and probe more deeply for participants’ specific expectations and reactions.

Conclusion

“During formative testing, a moderator can balance the need to probe on expectations with the desire to get more natural responses by introducing participant-driven tasks at the beginning of a usability session.”

It’s clear that a moderator’s interrupting participants’ tasks with probing questions such as “What do you expect…?” during usability studies can make participants’ experience seem a bit unrealistic. However, from a UX designer’s perspective, this may be acceptable if the goal is to develop insights that inform the overall design approaches for a product. During formative testing, a moderator can balance the need to probe on expectations with the desire to get more natural responses by introducing participant-driven tasks at the beginning of a usability session. It’s an easy way to get the best of both worlds.

8 Comments

It’s certainly true that creating a realistic testing environment is not necessary for testing many types of task, but for more detailed tasks, I’ve often found it necessary to ensure testers act as naturally as possible for the most insightful results.

A couple of strategies we’ve tried successfully are:

  • Misdirect their thinking away from how they’re using a site by saying you’re timing users’ route to purchase or getting their opinions about how the products are presented.
  • Rather than asking them to narrate their behavior during the task—this is very unnatural and actually makes them think about what they’re doing far more than they ordinarily would—get them to voiceover the video when they watch it back afterward.

Michael, I rarely use the think-aloud protocol in usability testing. It makes time on task meaningless; encourages users to do better than they might normally, because they’re thinking about the task with more intent; and also not all participants are good at it. I’m sure you’ve had the participant who just says aloud what they’re doing instead of what they’re thinking.

As you say, it’s better used in formative testing than summative testing. Perhaps in this situation you describe the lesson learned is also to set appropriate expectations for observers, so they understand why you’re taking a particular approach.

Another technique I’ve used is to do the probes after the task has been completed or step the user back to that point in the task. That way you still have your task data, but can get the valuable qualitative feedback you want as well.

I never use the expectations question before something comes up. Like you say, it’s too disruptive.

Also, it’s putting a lot of the burden on the user. They’re typically not sophisticated enough to tell you, “Well, I’m expecting a comparison chart, with the features on the left and the products at the top—you know, like all your competitors have. Oh, and there’d better be some links from the products to the product-specific pages, because I will want to do more in-depth research. Plus, I’ll want to keep my top-level nav, but you can dispense with the local nav. Oh, and one final thing, can you make it so there’s no horizontal scrolling, and also some kind of feature so I can narrow down my choices?”

Users are very good, however, at “knowing it when they see it.” Thus, I find the expectations question useful only when the link’s been clicked and the new page has come up.

Note, though, that this prompt usually isn’t necessary. Users will typically tell you when something doesn’t gibe with what they expected. It’s really only useful when the user is obviously confused—through body language—but isn’t or is having a hard time articulating what they’re confused about.

If you were using this prompt consistently before anything came up, your observer was right. You were doing more of an interview than a traditional task-based, think-aloud, formative test. That’s all that really seemed to be going on here. Introducing the formative versus summative aspect, though a very important topic, seems a bit of a red herring here. (The same thing can be said for self-directed tasks.)

I’ve actually found expectations questions very useful, especially for early concepts. We’ve learned a lot both about people’s hopes and about miscommunications in how we present a feature. Here’s an example:

On a healthcare site, there is a list of questions patients should ask their doctors at various key points such as deciding on a treatment option. These are meant to be cheat sheets they can bring to an office visit to help them remember what to ask. Testing this feature on a mockup for a mobile app, we discovered that, when people saw the link Questions to ask your doctor and even Prepare to talk to your doctor, they thought it would open a chat session to a real doctor. I think I would have found this in their disappointed reaction to what they found, but it made it very clear.

For some participants, it became a sort of game: How well can this site guess what I really want. In some cases, what they found was also useful. This gave me the opportunity to ask how they would label the actual feature.

If you are testing prototypes with limited interactivity, it reduces the difference between working and nonworking buttons, making the session less of a hunt-for-the-hot-spot exercise.

Finally, it’s a great way to have a conversation about ideas that aren’t even prototyped. On a university Web site, we were interested in how part-time students integrated their course calendar with their other life. All we did was add a nonworking button to the page that suggested a feature to download or merge calendars. We got a lot of comments without even having to ask the question or point out the button.

All just goes to show that there is not one right way to run user research or usability testing. It takes sensitivity to the topic and context.

Stu and Jamie, thanks for your comments. You both mentioned retrospective questioning or probing on tasks. The two variations you mentioned are asking participants about their experience immediately following a task or having them watch a video of their experience and comment on it. I think these are great suggestions and most in line with what I called summative testing in this article. As you mention, if you have a site or prototype that can support full exploration of tasks and your goal is to get an unfiltered view of how a user would interact with a design, this is a great option. It can also help avoid the situation where a participant says one thing, but their behavior indicates another. Often a stakeholder or observer will pick up on what they said and not catch the nuance of how they reacted differently. If this is a concern in a study, separating the think-aloud protocol from the behavioral observation would be helpful.

Stu also noted that this can be especially helpful in providing participants with a realistic motivation—getting a task done quickly. For purchase and retail tasks in particular, you can combine this with an incentive model where the participant is compensated with a real purchase from the site. Not applicable to every environment, but helpful when you can use it.

Thanks again for your comments and moving the discussion forward.

Cliff,

Thanks for your comments and perspective on this. I would agree with your assessment that non-designer test participants are generally not good at communicating the specifics of page layouts and interactions that are often the details designers typically are most interested in. Certainly if every click was preceded by a What do you expect? question, that would be interruptive, difficult for a participant to follow, and would not lead toward good results.

Adding to your point, participants should not be seen as designers. It is very tempting to ask a participant How do you think this should work? But there are two problems with this. First, participants are not designers. The job of usability analysts and designers should be to observe behavior and determine solutions based on their observations. Second, if stakeholder observers hear a design solution from a participant, they may take it and run with it—a difficult situation if the entirety of the research indicates a different solution.

As I put together this UXmatters post, one of the ideas I had for the name was reality in a “User Study” rather than “Usability Testing.” As you describe, a large part of the project was to assess the overall impression of the site with regard to meeting customer needs and alignment of new brand terminology with customers’ vocabulary. For key moments in the study, using the expectation or key needs question provided the opportunity to uncover participants’ perspective on these branded terms and the intended content and functionality of the site. For example, if a participant desired a comparison chart or a link or button name suggested a comparison chart, and the content was different, we wanted to understand the implications. Leveraging a mid-fidelity prototype during this user study allowed us to get some usability findings and additional needs and brand analysis completed.

Whitney,

Thanks for your comments and examples. I agree with the overall assessment that a testing or reseach protocol should be driven by the topic and context—it depends. This is in line with the other comments on this article that suggest various needs and approaches to user studies. Understanding these nuances is what helps define usability professionals.

You describe a scenario where I have found the expectation question to be quite valuable. Specifically, early in the design process, I often find that I have rough artifacts to test with. In the case of Web pages or Web apps, this could mean a few screens with only key scenarios actually built out. During these studies, if a user indicates that he would click a link or button that is not coded or working, I might ask them their expectations of what they would find. This not only informs changes to the existing screen, but I can also get information about needs for the function that is behind that link.

As mentioned in a previous thread, the key would seem to be balancing when you ask for expectations and when you don’t. Asking the What do you expect? question too often would certainly disrupt the flow of a study. But at key points of a study early in the design process, it can provide key insights to not-yet-developed areas or inform changes to nomenclature and application-specific language.

uxfindings.blogspot.com: I love the idea of having them just go through the task as they are recorded, then having them do the voiceover afterward. That way, their experience is unadulterated the first time. It’s like being able to conduct quantitative and qualitative research, all with the same user. Nice idea.

Stu Marlow.

Join the Discussion

Asterisks (*) indicate required information.