Talking Out Loud Is Not the Same as Thinking Aloud
Published: March 20, 2012
A recent Alertbox by Jakob Nielsen, “Thinking Aloud: The #1 Usability Tool,” reinforced the usefulness of the thinking-aloud protocol as a great usability testing approach. I couldn’t agree more. But there is a big difference between someone’s thinking out loud about a task they are doing and someone’s voicing their opinion about a design. The first is very valuable; the second is so-so at best and dangerous at worst.
Here is the kind of data you want to get from a user who is thinking aloud:
- “I want to do…”
- “I’m looking at the UI, and I think it does…”
- ‘Hmm, that’s not what I expected; I thought it was going to…”
- “That took longer than I expected.”
In short, you want to learn how a user sees her task and how she is making sense of a user interface in terms of that task.
What you don’t need to hear are comments like:
- “I think the background should be blue.”
- “I don’t think other users are going to understand this…”
- “I would put this field over there.”
For one thing, the latter kind of information does not enlighten you about the problem space. It just adds one more opinion into a mix that probably has no shortage of opinions already. At best, it’s a distraction; at worst, it leads developers to make bad design decisions based on what the “users told us they wanted.” In the first example, did the user who specified the color blue have a background in visual design? I’m amazed how quickly people are willing to overturn the opinion of our in-house visual design expert who has a degree from a leading design institution just because an accountant tells us he likes blue.
So how do you get users to give you the kind of feedback you need? I use two techniques to improve the think-aloud data that I get:
- Providing instructive practice at the beginning of a usability testing session
- Using operant conditioning techniques—specifically, reinforcement and extinction—during a session
Before I start the first task with a test participant, I explain the think-aloud protocol, then ask him to count the windows in his house while thinking aloud. I tell him, “I’m not really interested in how many windows you have, but I am interested in how you go about doing this task.” Some users will sit quietly, then say, “12.” I then point out that I have learned nothing about how they arrived at that answer. I ask them to try again, but this time, to work harder at thinking aloud. They try again, “Okay, in my kitchen, I have one over the sink, in the living room there are three, the den has one behind the couch….” At that point, I stop them. “Okay. Now I have some insight into how you’re solving the problem: you imagine yourself inside your house, then mentally go from room to room counting the windows, starting with the kitchen.” Usually that makes the proverbial light bulb go on, and participants say something like, “Oh, I see, you want me to think out loud.”
Another useful tip: Have participants perform a safe, short task first, so if someone is having trouble thinking aloud, you’ll have a chance to work with him more on thinking aloud before getting into longer, meatier tasks.
During a usability test, I rely on two principles from B.F. Skinner’s operant conditioning: reinforcement and extinction. Most of us are pretty familiar with reinforcement, but extinction might be a new concept—or at least one you haven’t thought of including in your participant management toolkit.
Reinforcement is the use of praise or a reward to encourage someone to continue a certain behavior. For example, when a facilitator says, “You’re doing a great job,” a user is likely to try to do more of what the facilitator just praised.
Reinforce participants’ behavior only when they give you data, and praise them for exactly that: giving you data. Do not reinforce their giving you design suggestions.
For example, if a user tells me, “This instruction here is really confusing,” I first try to clarify where the confusion lies—that is, is it a word he doesn’t understand, some ambiguity, and so forth. Then, I usually say, “Thanks, that’s useful to know—that’s just the kind of data we are looking for.”
I do the same if a user wants a product to do something it doesn’t. “So you expected the application to automatically correct the word manger, making it manager, because manger wouldn’t make sense here. It’s helpful to know what you would like it to do.” Notice that I did not say, “That’s a good suggestion.” Why not? For one thing, I’ve got a room full of developers watching this test who know it’s an unreasonable expectation for a spelling checker—or at least not for one that wouldn’t put the product back into a lengthy development phase. If I said, “That’s a good suggestion,’ I’d start to lose credibility with them. Also, they might start discounting all of the input that I get from this user as a way of discrediting that one suggestion.
And most important, if I were to praise a participant for making a design suggestion, what behavior would I be reinforcing? Making design suggestions! But I didn’t bring this person in to make design suggestions. I wanted to see how real people make sense of a user interface within the context of doing authentic tasks.
The next technique I’ll discuss is extinction. How do I deal with unwanted feedback such as style preferences and design suggestions? I do nothing at all. That is what extinction is: the absence of feedback that would reinforce an undesired behavior. A common application of the technique of extinction is to take a child who is acting out and give him quiet time. The intent is not to punish the child, but to get the child out of the environment that is reinforcing the unwanted behavior—typically, the extra attention the child is getting from his teacher or his peers.
While reinforced behavior typically continues, behavior that does not get reinforced eventually goes away, or becomes extinct. So a typical exchange would go something like this:
Participant: I didn’t see the Submit button.
Me: So you didn’t know what to do when you finished the form because you didn’t see the Submit button. Thanks, that’s useful information.
Participant: Yeah, I think you should make it red.
Me: In this next task, we are going to ask you to…
My first response was positive reinforcement for the insight that the button had not captured the user’s attention.
However, my second response moved right along without acknowledging the design suggestion. Thanks to the first response, we know we may have a problem with users’ not noticing the Submit button. I’ll let our visual designer who has the degree from the well-respected design institution run with that.
I know there are some people who might want participants’ design ideas. But once you start down that path, participants stop sharing their thoughts, which would give you insights into a problem space, and instead, start giving you their ideas for design solutions—because they begin to believe that’s why you brought them in. Why? Because you keep reinforcing that belief. When, in my earlier example, that participant said, “12 windows,” I knew his answer, but had no insight into how he got there. Again, I usually have no shortage of experts in a solution’s domain space; what I lack is a clear vision of how the problem looks from a user’s perspective.
Getting participants to think aloud turns their tacit behaviors—such as interpretation, planning, and reacting—into explicit behaviors that you can observe and analyze. For this reason, thinking aloud is a valuable tool in doing usability testing. But if participants start to believe that you are interested in their design suggestions, they’ll stop sharing those internal behaviors you want to know about to get insights. Instead, they’ll start to give you insights into how they would have designed something—not into how they are using it. They’ll become like the participant who merely said “12” during the count-the-windows exercise.
Clients who are eager to jump on any input they get from real users sometimes place too much emphasis on such design suggestions. It is the qualitative equivalent of a similar quantitative problem I have with clients’ placing too much importance on descriptive statistics from small samples. “Oooooo, the average time to install went down 12% between the first version and this version—we’ve made it easier to install.” Maybe; maybe not. This makes me just want to wave my hand and say, “These aren’t the droids you’re looking for.” So the more I can do as a moderator to keep participants focused on sharing how they are counting the windows, so to speak, and less on what number they come up with, the easier it is for me to keep my clients focused on obtaining truly valuable data.