CES 2020: Google announces user milestone and looks to embed deeper into smart homes

Enhancements to Google Assistant include new scheduling capabilities and more natural sounding voice for reading long-form text.

CES 2020: Google announces user milestone and looks to embed deeper into smart homes Enhancements to Google Assistant include new scheduling capabilities and more natural sounding voice for reading long-form text.

TechRepublic's Karen Roby talked to Scott Huffman, vice president of engineering for Google Assistant, at CES 2020 about the artificial intelligence used in Google Assistant and the future of machine learning. The following is an edited transcript of their conversation. 

Scott Huffman: We're really excited to be here at CES, and we're really showcasing a couple of things. One is lots of the amazing devices that both have the Google Assistant built in and that you can use your voice with the Assistant to control. And then we have a lot of stations showing demos of new functionality that's coming to the Assistant. We're really trying to just make your life a little simpler, a little easier with the Assistant. We always say we want you to just be able to ask for whatever you want or need and it happens for you. So we're showing off some new ways that we're doing that.

SEE: CES 2020: The big trends for business (ZDNet/TechRepublic special feature)

Karen Roby: And in terms of Google Assistant, it can already do so much, and you guys just keep making it better, enhancing different features. So talk specifically about some of the language additions and things like that.

Scott Huffman: One thing we're really excited about is that we recently crossed the big milestone, which is over 500 million people every month have a conversation with the Google Assistant, and it's in 30 different languages, 90 different countries. So for me, just as a computer scientist, that's pretty amazing.

That many people talking to Google to get something done. We've been constantly working on improving our language and our understanding. One thing that we're kind of showing off here is a feature that we call Read It Now. And what Read It Now does is just lets you, if you're looking at a webpage and say, "Hey, read that webpage, my hands are busy." And what we've done behind the scenes is created better voices that sound more natural when reading that longer content. We've had great voices for a while, like John legend, who can read the weather, very optimized for reading short things.

These new voices are very good at reading a whole news article, a web page, a book chapter while sounding very natural without fatiguing your ear. And what the team did to do this is actually worked with a bunch of different voice actors, have them actually read longer-form content and then used our machine learning models, which is called WaveNet, is our technology for creating voices. And what's interesting about WaveNet is compared to other technology, it's able to create a great voice with not nearly as much data, basically. We were able to take these things into the model that WaveNet creates, captures what's called the prosody of how people read--if you think about how your voice goes up and down as you talk. It's different if you're reading something long versus if you're saying something short. And so the models are being able to capture that.

Karen Roby: Very interesting. You talked about machine learning: Talk a little bit more about the tech behind the tech.

Scott Huffman: The Assistant really is completely, these days, powered by machine learning. One stat that I love is, think, what's the simplest thing an Assistant can do for you? Well, maybe it's set an alarm. While it sounds simple to set the alarm for 6 a.m.--that's simple, right?

But even in the English, we see that people say that to us in over 3,000 different syntactic variants. They say it in every different way, and so you really can't just write all that down. You need to have machine learning that's able to look at examples and really understand more deeply when someone says this, what do they mean? One of my favorite examples that we see people say things like, "Hey, I have a flight in the morning. Wake me up at 6 a.m." And we're supposed to understand that, oh yeah, that part about the flight in the morning, that's not important, but I better wake you up at 6 a.m. That part is the important part." Machine learning is letting us capture those kinds of variations.

Karen Roby: I know that you have a very technical background, so what is it that really excites you maybe as you look down the line at what's to come?

Scott Huffman: I think one that gets me excited is really this vision of ambient computing where we've gone obviously from a computer the size of this room to a computer that I have to go over to my desk and use, OK now a computer is in my pocket. But we think the next step maybe is the computer is just here with me. And I'm starting to see it in my house. It's actually kind of interesting because you can imagine I have these Assistant devices in most of the rooms in our house, and I watch my teenage kids, and for them it's very natural. As they walk around the house, they think to themselves, "Oh I have this question." They just shout out, "Hey Google, blah blah blah," and they get the answer.

For them, it's just the natural way. And to the point where they don't ask their mom and I anything anymore. We don't know anything, but they ask Google.

I was watching my son the other day studying for his biology final, and he's studying his notes, and then he just starts asking Google, "Hey, is a chromosome a diploid?" And Google answered it. And then he starts firing questions like that one after the other and Google's answering these questions, he's writing notes, and I'm like, "Okay, this is cool." It's like he has a built-in tutor right there in the room. It's good. I don't know any biology. So it's a good thing.

Karen Roby: That's a nice window into what that generation, what things will look like for them. And one more time, Scott, before we let you go. Say again: It was 500 million?

Scott Huffman: 500 million people around the world, 30 different languages, in 90 countries will have a conversation with Google in the next month.

Also see

20200108-google-karen.jpg