Conducting Qualitative, Comparative Usability Testing

Practical Usability

Moving toward a more usable world

A column by Jim Ross

March 6, 2017

Although UX designers usually consider various different design directions early during projects, they typically choose one design to develop further—long before conducting the first usability test. However, testing multiple designs early in a project can provide much more useful information than testing just a single design solution. When participants can experience two or more designs during testing, they can provide better feedback. As a result, you can gain greater insight into the elements of each design that work well and those that cause problems.

What Is Qualitative, Comparative Usability Testing?

When you read the term comparative usability testing, you might think it refers only to benchmarking the usability of an existing user interface against that of its competitors. In this type of comparative usability testing, you’d compare existing user interfaces with each other, using quantitative metrics such as task-completion and error rates and time on task. Therefore, participants perform test tasks without interruption and do not think aloud. You might also compare participants’ responses to a questionnaire.

Champion Advertisement

But, in this column, I’ll discuss qualitative, comparative usability testing, which has an entirely different purpose. By conducting qualitative, comparative usability testing early in the design cycle, you can assess the pros and cons of various design directions.

For this type of comparative testing, you should create lower-fidelity prototypes that include just enough functionality to compare the differences between particular design solutions. For qualitative, comparative usability testing, compare task-completion rates, record errors that occur, and note whether participants understand or are confused by particular aspects of the designs. The participants think aloud and, during each test session, they discuss the problems they encounter with the facilitator, as well as their opinions of the different designs. This type of qualitative, comparative testing gives you a good understanding of which elements of each design work well and which cause problems—before you commit to a particular design direction.

Why Conduct Qualitative, Comparative Usability Testing?

Let’s look at the benefits of conducting qualitative, comparative usability testing to evaluate different design directions early in the design process.

Participants Provide Better Feedback

When you give participants the opportunity to experience multiple design solutions, they provide more useful feedback. You’ll notice the difference immediately when you conduct a qualitative, comparative test. Once participants use the first solution, they’ll usually give you moderately positive or negative feedback on it. Unless they encounter particularly annoying problems, you’ll probably hear something fairly neutral—such as, “It’s okay” or “It’s pretty good.” Participants usually have a hard time imagining how something could be designed differently. So, if a solution works and lets them complete their tasks, it seems fine to them.

However, once participants use a second or third design solution and have something to compare to the first one, their responses suddenly become much more animated. Seeing two or more ways of doing the same tasks makes it much easier for participants to contrast and discuss the differences between each design solution.

You’ll Have Data to Compare

In addition to participants’ comments and opinions, you can collect quantitative data such as task-completion rates and error rates, as well as questionnaire data. If you test only one design and have data only for that design, you have to decide what task-completion rate, error rate, and questionnaire averages to deem successful and which represent problems. However, when you have data from testing two or more designs, you can compare the data to determine whether one solution is clearly more successful than another.

Designers Can Defer Design Decisions Until After Testing

Designers naturally begin by considering various different design directions. When designers know you’ll be testing multiple design solutions, it encourages them to keep their options open until after testing. Thus, qualitative, comparative usability testing prevents designers from settling on a particular design decision prematurely, without adequately considering all of the alternatives. In contrast, if designers decide to go with a single design before usability testing, they may never receive the rich, comparative feedback that could have informed their design decisions or, at best, they’ll receive it after they’ve already made their decisions. At that point, it would be difficult to go back and drastically change the design.

When designing and prototyping two or more different solutions, you’ll have less time to build out deep prototypes of each one. The need to focus on the most important screens and interactions prevents your exploring any particular design direction too deeply before getting user feedback.

Designers Can Test Their Own Designs

When designers test their own designs, they are sometimes overly biased or have become too invested in their designs. If you test only a single design, it might seem that the findings should enable you to validate or reject that one design. However, when you test several different designs, testing is more about discovering which elements of each design work best. So the focus is less on evaluating the work of the designer and more on comparing the pros and cons of different design options.

You Can Use Testing to Resolve Design Disputes

Clients, stakeholders, and project team members often have different opinions about designs that can be difficult to resolve. Qualitative, comparative testing can help you to resolve such disagreements because it lets you see how different solutions actually perform. Leaving the decision up to the users can be an acceptable way of resolving disputes.

How to Conduct Qualitative, Comparative Usability Testing

Testing multiple design solutions requires you to consider a few additional factors that don’t come into play when testing a single design.

Avoid Testing Too Many Different Designs

Be wary of testing too many different designs. Testing two or three different design solutions is ideal. With more than three designs, testing becomes too complex for both you and the participants. People have difficulty remembering and comparing more than three designs. Plus, the more designs you test, the fewer task flows you can include in a study.

Test Only Designs That Actually Are Different

There should be obvious differences between the designs you test. Otherwise, what’s the purpose of testing them? If the differences are too subtle, participants may have trouble distinguishing what the differences are. So it would be awfully hard for them to compare and form opinions about those differences if they can’t readily perceive them. If designers have created designs that differ only slightly from one another, they should make an effort to come up with more varied design solutions before testing them with users.

Don’t Make Every Decision by Testing

Be mindful that you should not postpone making too many design decisions until after you’ve done qualitative, comparative usability testing. If the answer to every project team dispute or question is, “Let’s test it with users,” that’s a sign the team is overrelying on testing when making design decisions. The purpose of comparative usability testing sin’t to allow users to make every design decision. It’s about gathering information so you can evaluate and choose the best aspects of two or more designs. You should determine in advance which elements of the designs it is most important to compare.

Don’t Create an Alternative Design Just to Have Another to Test

Use qualitative, comparative usability testing only when you actually have different designs to test. Although trying out different design directions is usually ideal, sometimes there really is just one good way to design something. For example, if you’re following a common design pattern or convention, it wouldn’t make sense to do something different, unless you can radically improve on it. So, if you don’t have different designs to test, don’t force designers to create alternative designs just because you feel you should do comparative testing.

Have All Participants Use All Design Solutions

Although the quantitative data—including task-completion and error rates—that you gather is important, the qualitative data that you get from having participants think aloud and compare their experiences with each design solution is equally important. You’ll get the best qualitative data when participants can try the same tasks using multiple design solutions, then talk about the differences between them. Therefore, having all participants experience all design solutions is the best method of testing multiple designs early in a project.

In contrast, having each participant complete a task using only one design solution is more appropriate when conducting competitive usability testing, in which you compare your current user interface to those of competitors. In this type of benchmark testing, gathering quantitative metrics is most important, so you should avoid the order effects that would occur if participants performed the same tasks using different design solutions. Since participants’ qualitative opinions are less important, you don’t need each participant to experience and compare each design.

However, if participants must perform the same tasks using different solutions, be sure to alternate the order in which participants experience them to avoid order effects. For example, have Participant 1 experience solution A, B, then C; Participant 2, solution B, C, then A; and Participant 3, solution C, A, then B. This avoids the problem of tasks becoming easier with each subsequent solution they experience.

Choose the Most Important Tasks to Test

Testing multiple design solutions takes more time, so this usually limits the number of tasks and questions you can include in a study. Limit the tasks to only those that it is most important to test. For example, if you ask participants to perform the same five tasks using three different solutions, that’s actually 15 tasks. Plus, you need to save a significant amount of time at the end of each session for participants to compare and discuss the three different designs. Carefully choose the most important tasks and questions so you won’t overwhelm participants or make them rush through the session to try to fit everything in.

Gather Participants’ Opinions at the Appropriate Moments

When comparing different solutions, it’s often best to ask participants questions at different three points during testing, as follows:

after each task
at the end of testing each solution
at the end of the test session, when comparing all designs

After Each Task

After participants complete each task, ask them general questions about how they felt about that task, as well as specific questions about their actions or problems they encountered. By asking these questions immediately after each task, you’ll be able to capture this information while it’s fresh in their mind.

At the End of Testing Each Solution

Once participants have completed all of the tasks using a particular design solution, ask them general questions about their overall opinions of the solution they’ve just experienced. When you get to the second and third solutions, participants will naturally start making comparisons with the previous designs, whether you ask them to or not. For example, they might say, “Oh, I like this one much better than the first one!” Regardless of whether you want them to make these comparisons at that moment, it’s best to be flexible and let them provide their opinions while they’re uppermost in their mind.

At the End of the Test Session

Once participants have completed the entire test session, ask them to look back at each design solution, compare them, and discuss their relative merits or issues. Show them each design again, to refresh their memory, and allow them to review the designs again as they answer your questions. Otherwise, it can be difficult for participants to accurately remember the differences between the various designs.

To ease your discussions, give each design solution a generic name such as Solution A, Solution B, and Solution C. At this point, you can ask participants questions about how the designs compare and their overall preferences and opinions. Gathering information about what aspects of each design they like or dislike is much more important than which solution they prefer overall. You’ll often find that there is no clear winner over all the other designs, but you can discover which aspects of each solution work well or poorly. The designer may be able to combine the best elements of each solution.

Gather Ratings in Addition to Opinions

While participants’ qualitative comments and opinions are important, they can sometimes be hard to compare. For example, a participant might give you these opinions regarding the different solutions:

Solution A: “I like this one.”
Solution B: “This one’s pretty good, too.”
Solution C: “I guess this one’s not too bad.”

While you can ask additional probing questions to get more detail, how can you compare such opinions about these three solutions? And how do you compare one participant’s feedback—“It’s pretty good.”—to another participant’s—“It’s okay.”? Clearly, one comment is slightly more positive than the other, but by how much?

In addition to asking for participants’ opinions, ask them to provide ratings for the various design solutions. You could do this using a questionnaire such as the System Usability Scale (SUS) or devise your own Likert scale. Even with a small number of participants, you can compare average ratings across participants, as well as the different ratings from a single participant. For example, if a participant gave Solution A a rating of 6 out of 7 and said, “I like this one,” while she gave Solution B a rating of 4 out of 7 and said, “This one’s pretty good, too,” the numbers provide a better comparison than those two statements. Similarly, if you found that Solution A received an average rating of 4.3; Solution B, 6.1; and Solution C, 5.4; these numbers are much easier to compare than the participants’ qualitative statements.

Try Qualitative, Comparative Usability Testing

The next time you begin a project, plan to do some early usability testing on two or three different design solutions. When you plan to do that up front, you’ll find that designers spend more time exploring different design possibilities. They’ll also tend to keep an open mind longer and avoid getting too attached to a particular design solution too soon. Testing multiple designs often reveals new ideas that you’d never considered before and, thus, can help you come up with an even better design solution.

In Usability Testing

Jeanne Hallock

March 3, 2022 4:13 PM

Thank you for this article! I’m doing one of these types of usability tests for the first time as part of a design-thinking workshop. Your information was far more complete than anything else I found out on the Web.

I’m struggling with the ratings-questionnaire idea. When would you have the participant do a ratings questionnaire? After each solution? After all the solutions? After the session, as a followup?

I would be hesitant to do a full SUS for each solution, even as a followup. I almost see using the SUS statements and having the user rank each design according to that statement when I did a later followup. I’m probably just talking through this to myself here to get to a solution, but definitely would appreciate recommendations—for me or anyone else coming along later.