Vocal disguises and impersonations may fool voice recognition authentication

Research indicates that impersonating a voice can fool voice recognition authentication systems. Multimodal biometric authentication might be a better security option.

Pundits predict that biometrics will eventually replace passwords as the preferred method of authenticating users.

On the surface, using physical features instead of passwords seems like a win-win for everyone concerned. There is nothing to remember, and there are many physiological identifiers such as fingerprints, finger geometry, iris recognition, vein recognition, retina scanning, voice recognition, and DNA matching that can be used to identify users.

SEE: Incident response policy (Tech Pro Research)

What is voice recognition?

One method of biometric authentication that is coming into its own is voice recognition. The science required to analyze voice patterns has been in place for some time, but only recently has technology provided the statistical, analytical, and data-processing techniques to support voice recognition writes Dualta Currie in his SANS paper Shedding some light on Voice Authentication.

Besides no passwords, voice recognition offers two additional features. Voice recognition:

  • makes it possible for individuals to gain access remotely using existing communication methods, and
  • is less physically intrusive than other techniques such as retinal or fingerprint scans.

SEE: Cybersecurity in an IoT and mobile world (PDF download) (ZDNet/TechRepublic special report)

How does voice recognition work?

When authorizing users, the first and most critical step is to get valid voice recordings from those seeking to gain access to the app or device requiring voice-recognition authentication. The recordings are then stored in the authentication system's database (which, hopefully, is encrypted).

When access is desired, users verbalize the prearranged voice sample. The authentication software compares each sample with the registered recording in the database using:

  • Physiological biometrics:Distinctive physical traits of an individual's voice—tone and pitch are two examples.
  • Behavioral biometrics: The exclusive way individuals perform specific actions— accents for example.

Microsoft has developed a voice-recognition API for its Azure products. This Microsoft website demonstrates how the API works. When you're on Microsoft's Speaker Recognition API site, follow these steps:

  1. Select a passphrase from the given list.
  2. Use that phrase and record three audio samples to register your voice with the service (this step is called enrollment).
  3. After your enrollment is completed, you can start the verification step using a different voice recording or phrase to test the service.

SEE: Yoti aims to provide everyone with a biometric digital identity that works via a smartphone app (ZDNet)

What are possible security risks with voice recognition?

If biometrics are so good, why are we still using passwords? It seems biometric authentication is not infallible.

"Research supports the theory that certain biometric security mechanisms may not be as secure as once thought," writes Conner Forest in the TechRepublic article Windows face recognition fooled by printed photo. "This could slow the adoption of such technology, especially among business and professional users, just as it was starting to gain more mainstream traction with the release of the iPhone X."

It appears that voice recognition as a method to authenticate users is not infallible as well. Rosa González Hautamäki from the University of Eastern Finland states in this university press release that it is possible to fool state-of-the-art voice-recognition systems.

The researchers are not overly concerned about attacks using technical means such as voice conversion, speech synthesis, or replay attacks (PDF). "The scientific community is systematically developing techniques and countermeasures against technically-generated attacks," writes González Hautamäki in the press release. "However, voice modifications produced by a human, such as impersonation and voice disguise, cannot be easily detected with the developed countermeasures."

In her Ph.D. dissertation Human-induced voice modification and speaker recognition (PDF), González Hautamäki explains the details of the study. The researchers studied voice patterns from two professional impersonators who mimicked eight well-known public officials in Finland. The study also analyzed voice prints from 60 Finnish individuals who were asked to fake their age by modifying their voices to first sound like an old person, and then sound like a child.

"The study found that impersonators were able to fool automatic systems and listeners when mimicking some speakers," writes González Hautamäki. "In the case of acted speech, a successful strategy for voice modification was to sound like a child, as both automatic systems' and listeners' performance degraded with this type of disguise."

The study's conclusions are a bit nebulous due to the limited amount of data and unaccounted-for variables. González Hautamäki mentions that impersonated and disguised voices were gathered in relatively quiet environments, not at all real-world conditions.

SEE: How we learned to talk to computers, and how they learned to answer back (PDF download) (TechRepublic cover story)

Is multimodal biometric authentication the answer?

With voice-recognition systems becoming more popular, it is important to understand the ramifications of using this type of authentication. The study concludes it is possible for an impersonator to co-opt a voice-recognition system. Something else to consider: Even if the voice-recognition authentication process is 99% accurate, who wants to belong to the 1% where it's not?

The solution could very well be multimodal biometric authentication. Similar to multifactor authentication, multimodal systems increase the difficulty of forging someone's biometric identification by requiring several biometric indicators to agree with the valid biometric information in the authentication database.

González Hautamäki in the university press release offers the following final advice, "This issue prompts an interest to improve the robustness of speaker recognition against human-induced voice modifications."

Also see

Image: AH86, Getty Images/iStockphoto

About Michael Kassner

Information is my field...Writing is my passion...Coupling the two is my mission.

Editor's Picks

Free Newsletters, In your Inbox