Idea Amplifiers, Activity Streams, and the Distributed Supercomputer: A Conversation Around Hypervoice
Published: January 7, 2013
Martin Geddes is one of the few people whose opinion I uniformly respect and consistently listen to. I don’t feel this way just from reading his newsletters, but because we worked together during an exciting time at Sprint, periodically coauthoring whitepapers and designing revolutionary mobile ecosystems that never quite came to pass. Martin has kept up this work, and I think of him as being somewhere between a quite successful strategic telecommunications (telecom) consultant and a radical futurist. He is perhaps the most well-informed person about telecommunications—as both technological and social phenomena—that I know of.
At Oracle Open World, in October 2012, Martin revealed a concept called hypervoice, which is a term he’s coined to describe a breakthrough innovation that HarQen has achieved, but were struggling to explain. He had heard the phrase links what you say to what you do during a meeting with them, and it immediately became clear that a new hypermedium had been invented. Best of all, hypervoice is not just a concept, but a working product.
More recently, I sat down over a lunch in San Francisco with Martin and Mudassir Azeemi, a UX designer that I know, ostensibly to discuss hypervoice. We also ended up discussing the theoretical underpinnings of networks, the psychological and social pressures of communications, and the future of the Internet.
Steven: What’s wrong with traditional voice telecommunications—what we’re all used to today? Voice usage has been dropping, apparently in favor of other channels, for a few years now. Is this the end, or does it just mark a need for change?
Martin: I’ll answer your question by flipping it on its head. What does voice telecommunications do right? There has clearly been value in it for over 100 years. It tries to recreate the experience of being there with somebody else. The highest aspiration a telephone call had was to be as good as being there. It falls short in that it creates a session-based interaction that doesn’t match how we are interacting physically in the same space.
We can have an ongoing relationship, a conversation, across several media. I was outside on the street, checking my email to see where we were supposed to meet. We bumped into each other in the reception area, but there was no obvious place the session started.
Humans are not session-based creatures. When we met each other here, I didn’t recognize you from behind at first. Then you turned around and saw me, and we had a shared context, which was that we were both now in the same restaurant, where we had intended to be.
But a telephone call is not contextual; there is no context that it brings into an interaction. So, in many ways, it’s quite unnatural. The early telephone companies had to teach their users what to do and say. Before this, people regarded speaking to someone to whom you hadn’t been introduced as impolite. It was a wholly unnatural experience in that context of culture.
So, a whole bunch of social expectations were slowly established as to how we should talk on the phone. A dialect emerged around telephone calling that hadn’t previously existed.
In the meantime, the world has moved on: we’re now all carrying portable computers, and there is a multiplicity of ways for us to communicate with one another—and in particular, we’ve gotten used to these little minicomputers being amplifiers for every thought and idea that we have.
The things that we haven’t yet been able to amplify—aside from merely recreating voice at a distance—are our spoken thoughts, ideas, and gestures. If I text you, you can forward the text; if I tweet, you can retweet it. Twitter is a classic amplification engine for ideas.
But every spoken word is ephemeral and unindexed. So, in a world where you’ve got a choice between amplified thoughts and unplugged thoughts, the rock-and-roll version of communications will gather a lot more attention and use versus the purely acoustic version.
So, not surprisingly, telecommunications comparative advantage starts to fade away.
Mudassir: And that’s the reason text messaging and other text-based communications are on the rise? That’s why the mobile industry is capping minutes?
Martin: Yeah. The ratio of cost to benefits is changing. In some ways, telecommunications is becoming more costly, because people are spending more of their time in telephony-unfriendly environments. But its relative benefits compared to other things is improving.
Steven: So, what should telecommunications—which I guess we can interpret as meaning telepresent communications—be like? What attributes do they need to express for us to bring them more adequately into the digital era?
Martin: If you are a phone company, you are caught in a very awkward place. The mantra for the last ten years has been that all the OTT (Over The Top) players are coming, and telecoms will just become a dumb pipe. The world is a lot more complex than that. Spoken voice has been anchored to telephony for a very long time, and there is a complex and sophisticated set of technology, as well as social norms and government systems, to keep it there. There’s no particular reason why vertical integration is evil.
Apple does lots of vertical integration, and people don’t regard it as evil. So, if phone companies can deliver a better human communications experience based on their long history of delivering this stuff, why not?
There’s value in the networks and relationships and systems that we’ve got. If we were to start again, we wouldn’t design things the way they are. Like in the joke about the drunk who loses his keys and looks for them under the streetlamp, this is where we are.
Steven: Something you said a few months ago was to characterize Twitter and Facebook as hyper-messaging platforms. Platforms whose primary purpose is to link to or to enable access to more information. Hypervoice seems like not just a great thing in itself, but a herald for a new age of hyper-everything. What other media is failing in its promise, so needs to become more connected and a more integral part of our conversations?
Martin: YouTube is really a kind of hyper-video. You can overlay bits of information onto videos. Something becomes hyper when it gets a URL. If it doesn’t have a URL, it doesn’t exist. Email has traditionally not had URLs, so I can’t easily point to and publish an email message.
The necessary part of hypervoice is that voice objects now have URLs. But that’s not sufficient. Linking is the sufficient part. Linking what people said to what they did. So, all the objects we touch or interact with throughout a voice conversation, all the PowerPoint decks or notes or trouble tickets or Web pages that we view, all the gestures that we make all get linked back to what we say.
All together that enables a whole new way of thinking about voice. Rather than an ephemeral thing, it is a permanent, digital asset that we are creating.
This is a prototype of a hypervoice conversation. Steven will replay it, mark it up in various ways, and the notes you take—if you were using a LiveScribe pen—would be tied back to the moment at which I was speaking.
Steven: There’s no reason you can’t tie a service like this to an arbitrary device, so if someone like LiveScribe gets on board, their pens would become standards compliant, and you could use that data within your existing workflow.
Martin: Yes, you should take the hypervoice stuff and embed it into other things like Microsoft Office, so people could continue to use Oracle Social Network to take their notes, but still tie them back to the tools they prefer and are comfortable with rather than forcing them to use a whole new tool.
Everything has to fit within the existing workflows people use, with minimal changes. For example, I use a Kanban system for managing all my tasks. I have a very busy life, so I have about 12 different swimlanes and different statuses these things go into. It’s okay, but I want a system that tells me what I need to be doing next. Our relationships with computers are not quite adversarial, but oppositional. The machine should be working in lockstep with us.
Steven: Reading through your presentations and papers, I was less struck by how radical the ideas were than how natural they seemed. And also, that they were proposed decades ago. If I may, let me show you two quotations from Vannevar Bush’s 1945 article “As We May Think,” in The Atlantic: 
“One can now picture a future investigator in his laboratory. His hands are free, and he is not anchored. As he moves about and observes, he photographs and comments. Time is automatically recorded to tie the two records together. If he goes into the field, he may be connected by radio to his recorder. As he ponders over his notes in the evening, he again talks his comments into the record. His typed record, as well as his photographs, may both be in miniature, so that he projects them for examination.”
“All our steps in creating or absorbing material of the record proceed through one of the senses—the tactile when we touch keys, the oral when we speak or listen, the visual when we read. Is it not possible that some day the path may be established more directly?”
The popular mythology of the information age is that people demand something better, then go build it. But, in fact, as you were just talking about, there are a lot of regulatory frameworks and even inertia.
How did we get here, or more usefully maybe, how do we get out of where we are to make this future happen? How do you see the future evolving so these kinds of services aren’t little, fun niches in specialized, enterprise systems, but ways we can all get a hyper-everything world in the future.
Martin: The conference I am here in San Francisco to attend is the WebRTC Expo, which is allegedly about putting voice on the Web. But it comes from such a narrow and unimaginative place, which is that the highest aspiration is to put real-time, streaming, two-way audio and video into the browser. This totally misunderstands what the Web is about, which is about linking. It isn’t Web voice at all; it’s just browser voice.
It’s useful, but it’s neither necessary nor sufficient. We’ve got these things called telcos, and they deliver really quite good, high-quality voice already. To make hypervoice work, you don’t need browser voice at all. It’s irrelevant.
The question might be: where does this all go? Our idea of a browser is currently very limited. It’s still locked into the 1990s paradigm of basically client-server scaled up to global scale.
There are these two paradigms, one of which is the document hyperlinking paradigm; the other, activity streams. Twitter is all about activity streams. It’s not about documents. You can turn an individual tweet into a document, but it’s really about the temporal relationship.
And so the current Web is time blind. It’s like someone who knows only spatial metaphors, but there’s this other dimension of living called time. And that is important. And the Web doesn’t understand it. Today, the Internet and the Web are both prototypes, each with severe holes and faults. Neither of them is temporal.
Steven: Just the other day, I was outlining the limitations of current voice communications as a reason why we use texting and other means of sending data instead. Voice is, today, transient. You can’t tie it to data. You can’t review it. So if you mishear something, it’s just gone.
But there are bits and pieces that have pushed the boundaries for some time. For example, LiveScribe is a consumer product that comes to mind as being conceptually similar to hypervoice. But their approach is not universal, or standard. I cannot press *22 on just any phone to jump back 10 seconds. My LiveScribe notes are in a proprietary format.
Why are we stuck in a world where these ideas bounce around and don’t go anywhere? Why is hypervoice different?
Martin: Just last week, I rewatched Steve Jobs launching the original iPhone, and he talks about the revolutions that are driven by the introduction of a new interaction paradigm. We had the mouse, the iPod, and I think the third one was touch.
So, what’s changing is that we can expect people to have a screen during every conversation. We can expect them to be interacting with digital objects during a conversation, whether it’s typing notes or adding a tag to a moment in the conversation.
But if you were to capture all this, it would be both too much and too little. It doesn’t indicate where the moments of significance actually are. You have to assume that, when I type some notes or mention something, it would be like Google. I would be attaching quite a bit of significance to the metadata.
Before Google, AltaVista tried to work out the significance of the page by the content on the page. Post-Google, if I type the word Volkswagen, then put an anchor around it and point it over there, I am saying that content is about Volkswagens, regardless of what it says on that page. So, it gave a great deal of significance to the implicit metadata that people were creating with these links and attributes.
This is the same. HarQen is the Google of hypervoice, Google is the Google of hypertext, and Twitter is the Google of hyper-messaging.
Steven: Is the world ready for this? I am thinking of how poorly we have all embraced other technologies like GPS (Global Positioning System). Ten or fifteen years ago, people like you and I worked on projects where we developed many of the possible ways to use location to enhance context and preemptively assist users. Yet the LBS (Location-Based Services) conversation today is still mostly talking about how to get better turn-by-turn directions using your GPS.
So, we use these enabling technologies poorly. If you waved a wand and everyone had hypervoice—or hyper-everything—this afternoon, would people use it? Would they use it right? Is this a generational shift, or can we move to not just implementing this, but using this within a reasonable timeframe?
Martin: There are a number of forces at play here, one of which is fixed and immutable, which is that human generations are turning over. So, there is a generation of people growing up with the expectation that much of their life is being recorded by computers.
Then, there’s a technology of readiness. The first Web browser was in about 1992. I can remember using Netscape on Unix in a windowless room, doing banking applications many years ago. It took two to three years for that to hit mainstream and years more before many issues were worked out.
I sense that this change is similar in that the technology is ready. There are all kinds of burrs and rough edges around the interfaces and interactions. How does search work? There’s a first level of etiquette around how telephone calls should work. But how do you manage the etiquette of a hypervoice call? We might both have to agree to be recorded, and if anyone wants to listen to the recording, it might require the assent of both parties. Or, maybe it’s recorded in half-duplex, so I am allowed to share the words I said, but I can’t share your words without your permission.
It wouldn’t surprise me if, twenty or thirty years from now, most conversations were hypervoice conversations—even ones like this: we could naturally record it.
Steven: So, is it your expectation that you are going to have to decide some of this and essentially train people like telecoms did in the 1890s?
Martin: Maybe a little bit. When there’s a large organization taking a hypervoice approach, for example, even moving to social media and messaging is an organizational transformation that requires policies and training. But it should be an obvious, natural extension to social business. Thinking that social equals something like Twitter is too narrow. We network socially in many ways and have for millennia.
I have said that hypervoice is the missing carriage from the Cluetrain. 
There is a new organization to coordinate this. And they have all kinds of interesting and subtle problems to solve. From technology, to public policy, to social bonds. As I just said, how, for example, should assent be given to having our calls recorded. And, of course, hypervoice calls record much more, so how do existing call-recording laws relate to that? You may get into a conflict between laws that are about recording telephone calls and laws that are about recording every other type of media.
Or consider the impact of new protocols. Should I have a copy of the list of links you’ve been building during our shared hypervoice call? You’ve been typing notes. Am I allowed to use your notes as an index to my words?
Mudassir: Could there be a case where you are one caller opting out of a hypervoice call? Does this sort of scenario exist?
Martin: This is why you start off doing things inside enterprises, where you basically have got an authoritarian environment. You have a form of imposed consent.
Steven: It’s nice to turn it on it’s head. Normally, they say you can’t record anything, but I like the obvious, no, we’ll record everything, but in a codified way, so we can all share it.
Martin: In my view, there’s a happy middle ground where all my calls should be recorded, but any access to that information, any use of the information, I should know about. Messaging this, so it’s clear and not overwhelming, is a serious problem though.
Steven: One of the things I see people talk about, especially when there is some crisis of plagiarism or Photoshopping a picture, is that you should always have access to the original source, so everyone can check the veracity.
And at first glance, it seems that you are making it easier for me to cherry-pick the data out of a voice stream, which otherwise would have been a pain to cut down.
But since you are building it from the ground up, I wonder if maybe this can go a bit toward some of the original intent of hypermedia and make everything two way.
Martin: You mentioned location services, where you have this binary thinking that either you share or don’t share. I am willing to share my location with lots of people, on the basis that, if they access it, I know and maybe give assent. Perhaps asking in an automated way: why do you need my location at this time?
If it were in the minutes leading up to this meeting, it would be utterly unremarkable if you were to want to know where I was.
Steven: There’s this brilliant thing Jared Benson showed off at a conference we hosted in 2008. He had a little chart that described combining physical and social proximity.
You aren’t from Kansas City and have no family there, so I might reveal my location to you as Kansas City. But my wife needs to know I am at HyVee, because I don’t mind her knowing, and she can do something with that. It has to do with not just privacy, but relevance.
You, for the most part, I just need to know either that you are not nearby or you are in San Francisco and so am I! Related to your earlier thoughts about the Internet being generally unaware of time, I’d like to know you will be where I am, next week. So, we can integrate schedules and other data sources with this.
It’s exciting that you are working on something this early and this broadly, so you can look at all the failure points from before and fix them.
Martin: And the good news is that I don’t need to change telephony at all. Not that telephony won’t be changing as a market. The way we market and buy telecom services will be changing.
My belief is that the next realm of stuff is packaged cloud services. Just like telephony, this beautifully packaged service gives us all the productivity and all the features in one go that will become generic to all cloud services, and its relationship with the current Internet is not going to be the future. It is not mathematically going to be the future. That surprises a lot of people.
The expedient choices we made in the 1970s to take advantage of little things have a price, and we’re going to pay. The Internet has a bunch of fundamental problems that require major surgery or replacement.
Steven: I am reminded of all the times we almost got a different Internet. One I recall was an article on Hypercard. where the chief engineer said that, in retrospect, he drew the diagram incorrectly:
“Atkinson recalled engineers at Apple drawing network schematics in the form of a bunch of boxes linked together. Sun engineers, however, first drew the network's backbone and then hung boxes off of it. It's a critical difference, and he feels it hindered him.” 
We’re still living with this. We build computers that happen to plug in or happen to have radios instead of building networks. No matter how powerful it is now, we are still fundamentally building, say, a mobile app, and forget that it needs to attach to a service, so it does it very clumsily.
Martin: Actually, it’s all just computing. There is no network. There is no cloud. There is no device. All the dividing lines are in your head. This is only distributed computing; that’s all it is. There is an opportunity to recognize that networks are just large, distributed supercomputers and to kind of do what Apple did for devices to the network.
The conceptual foundations of the telecoms industry are missing. Computing has strong foundations. There is a whole discipline of computability. Turing sorted it. Telecoms is the business of translocating information. There is not a theory of translocation. As Turing envisaged computers, he fuzzed the network bits. He assumed the machine could see the tape. Time is left out of this. All networks do is move and delay stuff.
The whole of the telecoms industry has fractured foundations, and telcos act in all sorts of crazy, irrational ways without knowing it, because they don’t understand the fundamental nature of the business they are actually in.
Take queuing theory. They’ve started from the wrong place. They need to go way, way, way, all the way back. The very first assumption people made about networks is that they are machines that deliver packets. So that’s what we draw. There’s a belief system that is all about bandwidth, as if packets are still objects, and that is what’s going to help us in the future.
The packets are irrelevant. What we’re doing is trying to translocate information, and all this machine does is impair that process. Possibly infinitely.
So you have to step through the looking glass and see the whole thing from the other side. And when you do that, you realize that, actually, the place where you were standing was the mad world, and you’ve been born into the Wonderland side.
Steven: So, we need a general theory—and preferably a testable model‑of translocatable computability.
Martin: Some of my friends have developed this. We have it. Not really published, but the principles are out there in public. It turns out to be way simpler than you’d expect. The whole essence is that the system has two degrees of freedom. You have load, loss, and delay. Like temperature, pressure, volume. Pick any two, the other one is set. That’s it. That’s all networks do.
If you think that networks do work—because it’s part of the name—if you think they are about delivering packets, you get it all wrong, because you overdeliver packets, as it were. You stuff more packets into the system because you believe they deliver value.
But all you’re doing is putting things in the way of other things and, potentially, creating negative value.
Author’s note—For more information, read Martin Geddes’s presentation on lean networking.
Steven: So, for example, when you have said QoS (Quality of Service) as it’s done today is bad, you mean the guarantee of a level of service, right?
Martin: It’s not the guarantee, but the concept of locally optimizing a system. The network effects are disastrous. Framing Quality of Service in a bandwidth messaging concept turns out to be a dumb idea. Because you are then in the business of allocating impairment, not of giving priority. You choose to impair some flows less.
1. Bush, Vannevar. “As We May Think.” The Atlantic, July 1945. Retrieved December 5, 2012.
2. Weinberger, David, Rick Levine, Christopher Locke, and Doc Searls. The Cluetrain Manifesto. Cambridge, Massachusetts: Basic Books, 2000.
3. Kahney, Leander. “HyperCard: What Could Have Been.” Wired, August 14, 2002. Retrieved December 7 2012.