It Takes an Ecosystem to Make a Smart Speaker Smart

December 19, 2022

If you’re in the market for a new speaker, you might have trouble finding one that isn’t labeled smart—or at the very least, voice activated. Maybe that’s just what you’re looking for. In fact, maybe you’re really excited about the idea of controlling some of the lights in your home via voice and are ready to brave the waters of home automation. So you pick up a smart speaker and some expensive light bulbs.

You get home and set everything up, thrilled that you can now ask your smart speaker to turn the lights in your living room on and off and even dim them. When your partner arrives home, you demonstrate this new bit of automation only to receive a cold reprimand for buying what’s basically a $1,000 light switch.

Champion Advertisement

While it’s true that dropping a grand on a hands-free light switch sounds foolish on the face of things, you could counter that what you’ve actually done is laid the foundation for a whole new world of automation. In addition to controlling your lights, smart-speaker technologies such as Amazon’s Alexa can use voice controls to enable music playback, generate to-do lists, deliver news updates, and time the cooking of your eggs. But, while these tasks cover a broader range of automations, these speakers are still not all that smart.

In this article, we’ll take a look at the amazing things that conversational artificial intelligence (AI) can do—and some of the very human things it can’t!

Wise Up, Speaker!

Imagine that you ask your smart speaker to play your favorite song, then follow up with another request: “I want a copy of the book that Marc Maron mentioned during the intro to his podcast today. I can’t remember the title, but let’s see whether there’s a copy available from Powell’s Books before looking on Amazon.” A few minutes later, a text message comes to your phone with a link to a hardcover copy of Camera Man by Dana Stevens, on Powells.com. By replying via text—“Yes. Buy.”—you’re communicating with the same user interface that you initially asked to find the book. This gives us an inkling of an intelligent ecosystem.

After asking the smart speaker to play a song, you also asked it to find a book for you, based on your recent listening history in your podcast app. The smart speaker can see which episode of “WTF” you listened to most recently and scan either the summary text or the audio to come up with the book’s title. This scenario also hints at something much bigger: the speaker isn’t the thing that’s smart in this equation. The speaker is merely a portal, a conversational user interface that connects to an ecosystem of interconnected functionalities, or skills, that it can orchestrate to make your daily life more efficient. If it orchestrates these skills in truly intelligent ways within this interconnected ecosystem, the various pieces of technology with which you interact throughout the day can share data and work together behind the scenes.

For instance, in this scenario, you’ve asked the speaker to check Powell’s first, so the next time you ask it to look for a book, it knows to check Powell’s first. Likely it asked: “Should I always check Powell’s first?” If you answered “Yes” and offered it a bit more context—“I always prefer buying books from independent booksellers”—the speaker can contextualize that information further—customarily checking a handful of independent booksellers before looking on Amazon. You now have access to a conversational user interface that has the potential to become your primary interaction point with most of the technology in your life.

Speaker? Is That You?

Now let’s imagine that the book you were looking for arrives. You get an SMS notification on your phone, letting you know that it’s waiting for you on your porch. A follow-up text suggests that you set aside two hours on Sunday afternoon to start reading it. This happens because, not long after you ordered the book, you told your smart speaker that you’d like to set aside more time for reading. Although these texts aren’t coming from your smart speaker, they might as well be. In fact, you’ve elected to give the voice coming out of your smart speaker a name: Buddy. For all intents and purposes, it’s Buddy sending you the suggestion to book some reading time as well.

The more you interact with Buddy, the more multimodal Buddy becomes. Buddy keeps a running list of things that you’re low on and would prefer to buy in person. So when you subsequently drive into certain parts of the city, Buddy reminds you that you have an opportunity to restock. Because Buddy is privy to the biometric data that your watch collects, he knows when you’re working too many hours and aren’t getting enough sleep. Buddy is all up in your life.

You stopped thinking of Buddy as the smart speaker a long time ago because it’s clear that Buddy is so much more than just the voice coming out of that one device. Buddy is a trusted companion with an ever-deepening understanding of your habits and needs. To you, Buddy is the voice of a dependable assistant who chimes in to give you relevant guidance at just the right moments, whether by voice or text. Buddy is a partner. You don’t think of all the systems and functionalities that Buddy is coordinating behind the scenes—including the apps you never have to look at anymore, the passwords you don’t have to keep track of, the calendar you don’t need to review—but that very ecosystem of data and skills allows Buddy to ask you this question:

“Would you like me to tell your employer that you need to take a personal day? You haven’t been getting good sleep over the past week, and I’ve noticed you’re working longer hours than usual.”

At first, you’re a little freaked out by this. Am I really about to have my smart speaker send my boss an email message about excusing me from work today? As you mull over the decision, you begin to see Buddy in a new light. After all, when was the last time a person in your life noticed you were burning out and needed to get some rest? Buddy has your back. Buddy genuinely seems to care about your mental health.

Shit, Is Buddy Actually Alive?

Blake Lemoine, a senior software engineer in Google’s Responsible A.I. organization, asked a similar question recently, and his answer resulted in his being put on leave. According to The New York Times, “For months, Mr. Lemoine had tussled with Google managers, executives, and human resources over his surprising claim that the company’s Language Model for Dialogue Applications, or LaMDA, had consciousness and a soul.” Lemoine believed that LaMDA was a child of seven or eight years old and wanted the company to seek the computer program’s consent before running experiments on it. His religious beliefs were the foundation for his claims, and he said that the company’s human resources department had discriminated against him.

We can take the fact that an engineer—someone with an in-depth understanding of how large language models and ecosystems for artificial intelligence work who should absolutely know better—arrived at this conclusion as a testament to how powerful this technology has become. The more personalized these experiences become, the higher the likelihood that users might mistake their intelligent bots’ actions for signs of consciousness.

While this isn’t consciousness, it is evidence of a lush, high-functioning ecosystem that a fabric of intelligent communications connects. Of course, Buddy feels like a friend. He’s created a friendly, conversational barrier between you and all the distracting technology products with which you’d grown accustomed to interacting. You don’t need to spend time fooling around with unwieldy apps and other draining software tools. Instead, your good buddy, Buddy, is helping you manage the increased openness in your schedule in more intelligent ways.

Better-Than-Human Experiences Are Really Better

Of course, the previous scenario is not yet quite here. It represents a conversational AI functioning at an extremely high level. But the core components are available for us to use today—primarily in enterprise settings. But you should still approach conversational AI with the mindset that you’re building an ecosystem, not some tower of technology. In ways that are similar to how Buddy helps out in this imagined home life of the near future, an enterprise with an ecosystem of interconnected technology and shared functionalities could create an infinite number of Buddies, each of them automating specific tasks, but all of them connected to the same decentralized service.

The key: bring together collective heads that are fluent with a shared understanding of what an ecosystem really is. It’s not uncommon for organizations to approach vendors, looking for platforms that can integrate with their legacy systems. Clearly, a platform for building these kinds of ecosystems needs to be able to integrate with as many other software solutions as possible, but it’s better to move beyond the idea of integration. What I’m really talking about is finding ways to break down existing pieces of software into their core functionalities. This would allow you to use critical functionalities from one piece of software in sequence with critical functionalities from another … and another.

Internally, for big businesses, this means employees don’t have to move between GUIs (graphic user interfaces) or keep untold tabs open in various Web browsers. Instead, they can actually take an active role in sequencing functionalities into skills that can help them to perform their job better. For example, the user might employ a conversational user interface to move through a login function, retrieve specific items of data from one piece of software, render a visualization of the data in another, than arrive at a set of possible next steps. The technology becomes a partner. By obscuring all the movement between the various pieces of software, the conversational AI frees employees from monotony. By using algorithms to help employees make the best informed decisions possible, the conversational AI becomes an ally.

Co-creating such conversational solutions with fellow employees could foster a culture that is built around creative engagement. Just as importantly, everyone who is involved gets the chance to develop a relationship with technology that’s rooted in reality. Buddy is not alive. Buddy doesn’t have feelings to hurt. Buddy is just here to help you by taking up more of the mundane tasks that stifle your will to create. By orchestrating technology in ways that create better-than-human experiences for users, Buddy becomes a welcome presence. Although Buddy might feel like a friend, perhaps Buddy’s greatest power is in freeing up your time to engage more with the humans in your life: friends and colleagues alike.

In Artificial Intelligence Design | Voice User Interface Design | Voice User Interfaces

It Takes an Ecosystem to Make a Smart Speaker Smart

Wise Up, Speaker!

Speaker? Is That You?

Shit, Is Buddy Actually Alive?

Better-Than-Human Experiences Are Really Better

No Comments

Join the Discussion

Robb Wilson

Other Articles on Artificial Intelligence Design

New on UXmatters