Top

Conversational AI Search Engines: Implications for Usability and the User Experience

Envisioning New Horizons

A critical look at UX design practice

A column by Silvia Podesta
March 4, 2024

In February 2024, Fast Company [1] reported on the rise of conversational AI search engines. Large language models (LLM) power these tools, which can answer users’ questions by retrieving and summarizing information from the Internet.

Since the rise of generative AI, several conversational AI search applications have cropped up online. Academic and scientific research is spearheading a wave of experimentation in this field. [2] Frenzied enthusiasm exists around what seems to be a new way of searching for content that provides an alternative to the Google model of search to which we’ve all become accustomed.

Champion Advertisement
Continue Reading…

Although conversational AI search engines serve a user need as old as time, they’re not only going beyond the user experience of classic, keyword-based search engines but also bringing about novel user behaviors.

After conducting some analysis of Perplexity AI, I’ve had some thoughts on usability considerations and the wider implications of the usage of conversational AI search engines.

A New Mental Model for Search

Perplexity’s home page, which is shown in Figure 1, features many readily identifiable components of classic search engines: an input field for the user’s queries; the central body of the search-engine results page (SERP), which displays the results of these queries; and additional widgets or links that enrich the SERP by providing more information—similar to the results on Google’s SERP.

Figure 1—Perplexity’s home page
Perplexity's home page

But the mental model underpinning this search user interface is rather different, as Figure 2 shows by contrasting the mental model for classic search engines with that of conversational AI search engines.

Figure 2—Mental models for classic and conversational AI search
Mental models for classic and conversational AI search

The way in which users interact with Perplexity feels radically different from the user experience of its predecessors. The mental model is that of a conversational chatbot, a user interface that is now front-of-mind thanks to ChatGPT (Chat Generative Pre-trained Transformer). A lot of generative AI use cases revolve around conversational usesr experiences.

Another search engine, Andi, shown in Figure 3, takes this mental model to the extreme, in terms of its layout and interactions. However, its information architecture remains far more akin to that of a classic search engine, as I’ll discuss shortly.

Figure 3—Andi’s search user interface
Andi's search user interface

Interestingly, from an innovation perspective, both of these examples represent a technology push, [3] in which a powerful new technology—in this case, generative AI—drives the evolution of the digital experience market.

However, the extent to which this evolution will benefit users still remains to be seen. Pushing out new products or services just because a shiny, new technology is available sometimes means ignoring the human focus and can carry considerable strategic risks.

Improving the Search Experience

Conversational AI search engines improve the usability and interactions of the search experience. The query example in Figure 1 shows some important elements that make interacting with Perplexity a pleasant, rewarding experience for the user.

First of all, the format of the search result matches the text on a Web page or in an article. Instead of providing a list of links to pages that could potentially contain answers to the user’s question—that is, the typical output of classic search experiences—Perplexity serves the answer directly, by collating snippets of information from multiple sources.

In both cases—classic search versus conversational AI search—the user’s goal is the same: the end point of the search experience is finding an answer within a text. However, the big difference is that, with AI search engines, the user doesn’t have to evaluate each individual search result, make guesses, or perhaps even open a new page and scan through its text.

This search model definitely simplifies things for the searcher and closely matches the Q&A pattern that is so typical of human conversations—thus, meeting a crucial usability heuristic.

In following Nielsen’s usability heuristics for user-interface design, AI search engines further enhance their user experience, by doing the following:

  • Providing links to sources and related content is helpful because it gives users a sense of control.
  • Letting users ask follow-up questions is great not only in terms of user control but also helps with error prevention. Users can rectify or disambiguate their queries and get more accurate results.
  • Structuring the answers by breaking a topic down into logical subtopics improves readability.
  • Microactions—such as those of the Share, Rewrite, and Copy buttons—are also beneficial to the search journey because they provide opportunities for customization. [4]

AI-powered search engines seem to tick the right boxes when it comes to improving the user interactions of the typical search journey. However, does this necessarily mean that they should become the next standard for the future of search?

Broader Implications of AI-Powered Search: Trust, Explainability, and Human Agency

When using Perplexity, it is hard to forget just how much decision making a user is relinquishing to the AI. This tool deploys generative technology to craft an answer from disparate pieces of content by selecting, extracting, and summarizing excerpts from the sources that it deems most relevant to the user’s search. In doing so, the application falls short of communicating why it has selected particular sources and not others, neglecting a crucial aspect of AI explainability, [5] which in turn ties back to the trustworthiness of the system.

In her article “Building Trust in Artificial Intelligence,” [6] IBM’s Global Ethics Leader Francesca Rossi writes: “[AI raises] some concerns, such as its ability to make important decisions in a way that humans would perceive as fair, to be aware and aligned to human values that are relevant to the problems being tackled, and the capability to explain its reasoning and decision-making.”

The risks an untrustworthy AI system poses are now becoming quite a pressing concern in the enterprise world because of impending regulations and the prospect of huge fines, reputational damage, or legal suits. However, the same cannot be said for individual Internauts—highly skilled, habitual users of the Internet.

As Gen Z expert Roberta Katz points out: “There is a saying about architecture that has long seemed true to me: first you make the building and then the building makes you. A light and airy house can create one kind of emotional and behavioral response for its residents and, similarly, a dark and heavy home can create an altogether different environment.” [7] Katz notes that, just as with physical buildings, the architectures of our apps and IT systems have a significant impact on the people who use them.

It’s fair then to hypothesize that one big danger of a tool such as Perplexity is that people could become accustomed to accepting the convenience of a ready-made, authoritative-sounding answer, without pausing to question its accuracy or downright veridicity.

While it’s true that these tools provide links to the source documents, it’s difficult to ascertain the following:

  • the reliability of those same sources
  • whether the AI has discarded other, potentially better-qualified sources

While the first of these issues is also commonplace in traditional search engines, the second issue arises from the deeper delegation of decision-making that generally occurs when people use generative AI. Even in the eventuality that these tools provided full explainability, this easier user experience could impact users’ critical thinking in the long run—just as text messaging has allegedly contributed to sloppy writing. 

From a usability perspective, sifting findings on Google’s SERP to build answers ourselves, by patiently comparing and piecing together information from different articles, is surely more tedious and burdensome than quickly consuming a concise, authoritative-sounding summary. However, these tasks constitute healthy exercise for our analytical and creative skills, so can discourage an uncritical over-reliance on AI models, at least to some extent.

If users’ mental model of a search engine becomes that of a tool that always knows the right answer, the implications on our ability to critically discern truth from falsehood could be huge. Of course, this could be unimportant when users are just searching for a new pair of trousers or a tutorial on how to remove grease stains from a shirt. However, it could be very important when searching for more critical knowledge in fields such as policymaking, science, or civil society.

But it’s not necessary to build an AI search experience in this way. For example, Andi’s architecture doesn’t stray too far from the mental model of search that most of us are used to—despite its chatbot-like user interface. In the example query shown in Figure 3, the brief answer isn’t a generated summary, but a snippet of text from the first relevant source, Wikipedia. The results comprise a list of all the sources, with links to their relative pages, that appears on the left side of the page. Conceptually, this model is not very dissimilar from a SERP—not considering the layout.

Andi’s structure addresses another crucial point that the increasing popularity of these AI search applications raises—one that Kevin Roose of The New York Times has framed effectively: “If AI search engines can reliably summarize what’s happening in Gaza or tell users which toaster to buy, why would anyone visit a publisher’s Web site ever again?” [8]

At least, the big widgets of Andi’s app encourage users to click the links and visit the source Web sites, providing testimony to the power of UX and user-interface (UI) design in eliciting more desirable user behaviors.

The Importance of Trustworthy AI to the Future of AI Search Engines

Trustworthy AI is what really matters for the future of AI search engines. But, all things considered, the most interesting question is not: Will AI be the future of search?—because it, most likely, will.

Generative capabilities can enhance the user experience in numerous ways and help users get to knowledge faster. The following are better questions:

  • How can we make these AI search tools trustworthy, explainable, and reliable?
  • How do we make sure they don’t encourage the wrong user behaviors?

A study that Nature published [9] highlighted users’ conflicting views on AI science search engines, with some researchers finding the tools incredibly useful and accurate, while others lamented their lack of trust in and the inconsistency of the AI’s retrieval performance.

Since trust is the central issue potentially detracting from AI adoption, there are two key areas of focus to consider, as follows:

  1. AI’s retrieval and summarization capabilities—This technology is currently evolving at an accelerated pace. As companies roll out new approaches to development and models of information retrieval, we can expect significant improvements in AI’s ability to connect the dots across disparate pieces of information and synthesize new insights from them. [10]
  2. The deployment of explainability and other ethical practices relating to the trustworthiness of AI—If users are to adopt these systems, we must think about at least the following issues:
    • how to design transparent solutions that enable people to be aware of the model’s limitations and potential inaccuracies within the source documents
    • how to avoid both over-reliance and under-reliance on an AI’s suggestions, which could impact decision-making
    • how to strike the right balance between human and machine agency
    • how to make sure that any bias present in the data that is used to train and fine tune the model doesn’t lead to skewed search results
    • how to prevent AI hallucinations—where an LLM perceives nonexistent objects or patterns that are imperceptible to humans and, thus, creates nonsensical or completely inaccurate outputs [11]
    • how to train a model that represents all the dimensions of the query topic
    • how to encourage users to adopt healthy, critical-thinking practices, when assessing AI answers or knowledge

As new regulations appear to ensure safer and ethical uses of AI, the question of whether a good search experience is also the right experience becomes all the more relevant.

Within a context where we’re delegating a big chunk of our decision making to LLMs, our design efforts should focus on accuracy, trustworthiness, and the comprehensiveness of search outputs. [12] Of course, this requires much more than building a super user-friendly conversational user interface. 

References

[1] Ryan Broderick. “Does Anyone Even Want an AI Search Engine?Fast Company, February 21, 2024. Retrieved February 24, 2024.

[2] Katharine Sanderson. “AI Science Search Engines Are Exploding in Number—Are They Any Good?” Nature, July 3, 2023. Retrieved February 16, 2024.

[3] Roberto Verganti. “Design-Driven Innovation: Changing the Rules of Competition by Radically Innovating What Things Mean.” Boston: Harvard Business Press, 2009.

[4] Jakob Nielsen. “10 Usability Heuristics for User Interface Design.” Nielsen Norman Group, April 24,1994. Retrieved February 16, 2024.

[5] Vera Liao, Moninder Singh, Yunfeng Zheng, and Rachel Bellamy. “Introduction to Explainable AI?” IBM, May 8, 2021. Retrieved February 28, 2024.

[6] Francesca Rossi. “Building Trust in Artificial Intelligence.” Journal of International Affairs, Vol. 72, No. 1, Fall/Winter 2019. Retrieved February 14, 2024.

[7] Jules Naudet. “The Digital World Shapes New Social Structures and Conventions: Interview with Roberta Kaz.” Books and Ideas, Collège de France, June 8, 2022. Retrieved February 14, 2024.

[8] Kevin Roose. “Can This A.I.-Powered Search Engine Replace Google? It Has for Me.The New York Times, February 1, 2024. Retrieved February 16, 2024.

[9] Katharine Sanderson. “AI Science Search Engines Are Exploding in Number—Are They Any Good?” Nature, July 3, 2023. Retrieved February 16, 2024.

[10] Kim Martineau. “What Is Retrieval-Augmented Generation?IBM Research Blog, August 22, 2023. Retrieved on February 24, 2024.

[11] IBM. “What Are AI Hallucinations?” IBM, undated. Retrieved March 1, 2024.

[12] Jonathan Larson and Steven Truitt. “GraphRAG: Unlocking LLM Discovery on Narrative Private Data.” Microsoft Research Blog, February 13, 2024. Retrieved February 14, 2024.

Innovation Designer at IBM

Copenhagen, Denmark

Silvia PodestaAs a strategic designer and UX specialist at IBM, Silvia helps enterprises pursue human-centered innovation by leveraging new technologis and creating compelling user experiences. Silvia facilitates research, synthesizes product insights, and designs minimum-viable products (MVPs) that capture the potential of our technologies in addressing both user and business needs. Silvia is a passionate, independent UX researcher who focuses on the topics of digital humanism, change management, and service design.  Read More

Other Columns by Silvia Podesta

Other Articles on Artificial Intelligence Design

New on UXmatters