Last year around this time, Google updated Chrome,
adding a unique feature to the company’s web browser—Speech Recognition. Six
months later, Tal Atar, a SME in this field, discovered what he considered a serious breach of
security in the Chrome web browser, and the culprit—speech-recognition.

How Chrome’s speech recognition
works

Google created a speech-recognition Application Programming Interface
(API) that informs developers building websites how to interact with Google
Chrome and the computer’s microphone. The whole purpose is to give visitors to
the website the ability to control their experience using voice commands,
rather than having to type or click.

What makes the feature interesting is that Google transcribes the voice
command into text. After transcription, Chrome sends the text to the website; where
the web server deciphers the command, then executes it. Visiting this link will demonstrate the speech-recognition API.

Ater’s
contention

When visitors first arrive at a speech-recognition
enabled website, they are offered a choice, interface with the website
normally, or give the website permission to use the microphone.

 

 

There should be an indication similar to the slide seen above, notifying that the microphone is active. Ater’s security concern
centers on how the web site can enable the microphone without advertising that
it is active. One example was what he called a pop-under window:

“When you click the button to
start or stop the speech recognition on the site, what you won’t notice is that
the site may have also opened another hidden pop-under window. This window can
wait until the main site is closed, and then start listening in without asking
for permission. This can be done in a window that you never saw, never
interacted with, and probably didn’t even know was there.”

This may be a bit difficult to visualize. To
clarify the process, Ater created a YouTube video showing how the pop-under
window works.

Bottom line, if Ater’s contention is valid, putting
Chrome’s speech-recognition API in the hands of an ill-intentioned website
developer could turn a remote computer’s Chrome web browser and built-in microphone
into a listening device.

How the listening device works

Let’s say a bad guy created a malicious website
that uses speech recognition. Upon viewing, the malicious website appears to be
an exact duplicate of someone’s favorite website. That user receives an email saying
there is a gift waiting for him at his favorite website, just click the link.
Unknown to this person, it’s a phishing email, and the link sends that person to the
malicious website instead. That person is asked to try
the new speech recognition feature. They say yes.

According to Ater, this computer is now a
remote listening device. The malicious site will be able to monitor everything
within range of the microphone, whether the user knows it or not.

Google
or Ater, who is right?

Ater first reported his findings privately to
Google in September 2013. Ater said Google engineers had a fix within weeks. Then
a week ago, with no evidence of Google removing the bug from Chrome, Ater
decided to go public:

“As of today, almost four
months after learning about this issue, Google is still waiting for the standards
group to agree on the best course of action, and your browser is still
vulnerable.”

The standards group Ater referred to is the World Wide Web Consortium (W3C).
And, Google believes their implementation of the speech-recognition API is in agreement
with Section 4, Security and Privacy Considerations of the W3C report about speech recognition.

Ater disagrees:

“[T]he web’s standards
organization, the W3C, has already defined the correct behavior which would’ve
prevented this… This was done in their specification for the Web Speech API,
back in October 2012.”

Options
to prevent eavesdropping

I want to reiterate, for speech recognition to
work, the visitor must initially give
the website permission to use the computer’s microphone. If permission is not
given, the exploit falls apart.

There are ways to prevent eavesdropping for those
who want to use speech recognition. There are also ways to disable speech
recognition completely. For example:

The default setting in Chrome is “Ask if a microphone requires access” (see slide
below). One option is to trust that Chrome asking for permission, plus some
kind of indication that the microphone is on will be enough security.

Users who visit sites that use speech recognition and want to use it, but do not
trust the software indicator have the ability to toggle the microphone on and
off as shown below.

Users who are concerned about eavesdropping more than using speech recognition can
click on the setting circled in red (as seen below) and leave it.

 

 

One problem: all of the above options are software
based. There is no hard-wired switch to shut the on-board microphone off. For
those concerned about this, there are two additional options:

Visit the
Web Speech API demonstration website I mentioned earlier. If the microphone is off, you will get
verification similar to the slide below.

For those who want to be absolutely sure, physically disable the on-board
microphone, and when a microphone is required, plug an auxiliary microphone
into the appropriate socket.