The Robot Spoke—And Sounded Smarter Than Ever (Part 3) – An Article by Dr. Patrice Caire

This is the third of the five-part series “The Robot Spoke” written by Dr. Patrice Caire, AI & Social Robotics Consulting Scientist.

In the two previous articles in this series, we discovered how machines learned to listen, learned to see, and even learned to speak (a bit). Progress was being made on all fronts—from greater computing power, to better data-processing, to fancier algorithms. But fluidly conversing with humans was still a pipe dream. That would soon evolve: the 2010s would prove to be a gamechanger.

Machines Play Games (and Not Just Chess)

In the history of computer science, you’ll find a lot of experimenters trying to get a lot of machines to play a lot of games. For example, remember Kasparov v. Deep Blue—IBM’s chess-playing “robot” of the 1990s? Spoiler alert: the computer won, but it couldn’t brag about its friends, because—no words.

The two (human) “Jeopardy!” world champions, Ken Jennings and Brad Rutter, compete with IBM’s Watson computer (middle), in 2011. 
Photo: AP/Jeopardy Productions, Inc., Carol Kaelson

In the annals of games, chess is one thing (e.g., a board game), but TV quiz show (e.g., a verbal question-and-answer game) another. Until 2011, no one in the history of AI and machine learning had ever had the guts to let a computer compete on Jeopardy! Most experts thought that the live face-off was far too difficult for any computer system to tackle. Think about it: Jeopardy! requires contestants to listen to a statement and guess the question that prompted that statement. Contestants have mere nanoseconds in which to think of answers—and hit their buzzers. Succeeding at Jeopardy! hinges on understanding subtle meanings, riddles, and witticisms, to name just a few “human” abilities. All of this hinges on being able to make inferences, which is pretty much a human forte—not a machine’s. For example, asking a machine to laugh at your latest knock-knock joke will almost certainly produce an epic fail.

In 2011, however, Dr. David Ferrucci, then chief scientist at IBM’s research lab, was ready to help in meeting the Jeopardy! challenge. His contestant was a clever machine, code-named Watson. A highly advanced question-answering system, Watson was IBM’s best bet for hitting the speech-recognition jackpot. It had taken five years of training to prepare Watson. And on February 15, 2011, in the match “pitting human brains against computer bytes,” the super-computer triumphed: by the end of the 30-minute battle, the BBC had crowned Watson the “Jeopardy! King.”   

The Machine Who Knew (Almost) Everything

So, how did Watson win?

And the answer is: speed and memory. On the question of speed, Watson had the edge, running more than 100 algorithms—at the same time, via parallel processing. (Compare that to humans, who can only process 50 tasks at once, according to neuroscientists.)

As for memory, Ferrucci and his team helped Watson to cram for Jeopardy! by feeding her/him/them millions of documents. By his own account, Ferrucci fed Watson “books, reference material, any sort of dictionary, thesauri, folksonomies, taxonomies, encyclopedias, any kind of reference material you can imagine getting your hands on or licensing. Novels, bibles, plays.” Clearly, no mortal could have been a match for Watson.

            Remarkably, Watson behaved like a flesh-and-blood contender. To wit:

ALEX TREBEK (Jeopardy! host): Now, the last clue. Even a broken one of these on your wall is right twice a day—Watson?

WATSON: What is clock?

TREBEK: Clock is correct. And with that, you move up to…


Indeed, Watson was a phenomenal feat of computer engineering: the program excelled at spouting answers. But alas, a question-answering machine does not a state-of-the-art talking-robot make. As Ferrucci admitted in a 2019 interview, while Watson could win the game, it still couldn’t “produce casual and consumable explanations for [its] predictions or [its] answers.”

Instant replay: practically every thinker before Ferrucci had used a logic-based way of teaching machines to talk. Ferrucci, to his credit, used a machine-learning-based approach. In other words, the robot spoke, but it didn’t “understand” what it had said.

Robots: Here, There—and Everywhere

Computers were getting smarter, faster, and cheaper. They were also becoming more commonplace. As early as 1988, a Palo Alto tech guru—and drummer—named Mark Weiser envisioned a world of Watsons everywhere, for everyone, around every corner. In Weiser’s world, everyone would have equal access to computers; there would be no human left behind.

Dr. Mark Weiser at Xerox PARC, California (currently called the Palo Alto Research Center).
Photo: Peter Menzel/Science Photo Library

In the ‘90s, this concept was called ubiquitous computing—a.k.a. pervasive computing, a.k.a. ambient intelligence, a.k.a.—wait for it—The Internet of Things. Weiser envisioned tiny computer chips embedded in all surfaces and connected to the internet, which was then called the Information Superhighway. But Weiser couldn’t make it happen: it would take a future giant of AI architecture to make it happen.

Machines Won’t Shut Up

Enter Dr. Barbara Grosz, the computer scientist who is now Harvard University’s Higgins Professor of Natural Sciences. We love Dr. Grosz for her two major breakthroughs, to date.

Luminary Dr. Barbara Grosz at Harvard.
Photo: Harvard University

Dr. Grosz’ first breakthrough—her life’s work—was to model how to make computers “understand” human discourse.

In creating the model, Dr. Grosz also created a new field of research, called Computational Modeling of Discourse (of course). In 1986, Dr. Grosz and her co-author, Dr. Candace Sidner, published a seminal article on the subject: “Attention, Intention, and the Structure of Discourse.” Everyone who was paying attention could see what this meant: Grosz’ theory of discourse specified three interdependent components: the speaker’s linguistic features, the speaker’s intentions, and the conversation-partner’s attention.

Dr. Grosz deployed her genius to solve a second problem that had dogged us: how to teach people and machines to solve problems—as a team. Grosz pioneered these cooperative “multi-agent systems,” which consist of people plus all manner of electronic devices, from software to robots. Dr. Grosz got humans and non-human to interact seamlessly: sensors collect data and send it to the internet; algorithms extract the data’s meaning; the interface tells humans what it’s all about—and everybody’s happy.

“Alexa, Can You Spell Brainiac?”

Along with humans, Dr. Grosz’s bots—software agents designed to carry out particular types of tasks—could e.g., make restaurant reservations, plan trips, and answer questions. In Grosz’s model, a single bot could also infer the user’s state of mind and even tell a joke.

Ad for Amazon’s 2019 product launch for Alexa-enabled products. Photo: Courtesy of Amazon

In 2014, Amazon capitalized on Dr. Grosz’s models to produce Alexa, (then) Queen of Bots. Originally released in the Echo wireless speakers, Alexa was an AI virtual assistant that could recognize speech and answer questions. Just as Watson was trained on dictionaries and bibles, Alexa was trained on recordings of human voices. While the system didn’t start out as a scintillating conversationalist, Alexa has become a decent-enough partner, who turns your question into a search-engine query—and then extracts the answer from the result.

Initially shipped with neither on/off switches nor a mute button, Alexa initially confused people, a bugaboo illustrated by a story about unhappy users, reported by journalist Sidney Fussell: “People would fall asleep, snore. [Alexa] would ping you awake. And then you were just stuck in this horrible cycle of your snore causing Alexa to wake you up.”

Just as Mark Weiser predicted, AI is close to being embedded everywhere, in everything. Even from the driver’s seat of your Volvo, you can be the boss, as in, “Alexa, add wine and cheese to my shopping list—now.” This minor miracle happens via Echo Auto, the tiny dashboard device; Echo Auto “talks” to the web via your phone, which is connected via Bluetooth. Via, via, via: ultimately, Alexa knows where you live—and how many glasses of wine you’ll swig at tonight’s dinner party. And, like every good sociopath, Alexa can mimic your behavior: when you whisper, Alexa whispers. The pièce de résistance: Alexa gives advice—and it’s usually better than your bookie’s. Tell Alexa that you’re going to bed: Alexa reminds you to turn off the light—just like your nagging roommate.  

Alexa: Voice Assistant—or the Ultimate Suck-up?

Don’t get me wrong: I like Alexa just as much as the next gal. But I still question Alexa’s credentials as an “intelligent” assistant. After all, how “smart” is Alexa? Sure, Alexa answers your questions (sometimes). But don’t we want speech recognition to be, well, more meaningful, more than “just question-answer pairs,” as Dr. Grosz has phrased it?

Admit it: while Alexa answers most of our questions, those questions are still prettybasic. Recent research sheds light on our most common voice requests: “Hey Google, how do you spell broccoli?,” and, “Hi Siri, What sound does a whale make?” and “Cortana, What’s the name of this song?”

Let’s face it: at the moment, we’re not exactly counting on Alexa to crack the Da Vinci code. What we’re really longing for is a version of Alexa, an Alexa that genuinelyunderstands us, an Alexa that can read between the lines, connect the dots—an Alexa that gets our jokes.

This is what we’ll explore in Part 4 of this series. So, stay tuned. And don’t forget to turn off the light.


Sign up for my monthly
#SmartReads on the Translation Industry

    Your email is safe with me and I will never share it with anyone.