Google is reading text in YouTube videos for search crawling without user consent

Videos uploaded as private or unlisted are subject to being crawled, but Google's documentation does not acknowledge this behavior at all.

What attackers want when they hack email accounts Mark Risher, Google's director of product management for identity and account security, explains what hackers are looking for and how Google is ramping up account security.

Google is using optical character recognition (OCR) techniques to crawl URLs found in YouTube videos--including private videos--according to programmer Austin Burk, first reported by Naked Security. Burk found an XSS vulnerability in a different website, which he was reproducing using screen capture software as part of a responsible disclosure package. After uploading the video to YouTube, he found evidence of crawling activity with the user agent "Google-Youtube-Links" in server logs on a system he controls.

According to Burk, the URLs were visible in the address bar during the video, which was uploaded to YouTube, but kept unlisted. Burk then made a private video to test the behavior, which occurred in the exact same fashion as the unlisted video created for responsible disclosure.

Considering Google's core product is search, it makes sense that the company is always scanning the web. Google's use of users' personal activity, including browsing history and location, to target advertising and search results is well known. But YouTube's help article for video privacy settings makes no mention of this behavior, and Google's help article listing user agent tokens for their search crawlers also makes no mention of this crawler existing.

SEE: Virtualization policy (Tech Pro Research)

Even if Google's intentions are innocuous, this is potentially very damaging. Burk proposes a scenario similar to the XSS issue he was disclosing:

A security researcher has found a critical vulnerability in a site, and has crafted a URL that will trigger it, causing harmful effects to the website. (e.g a SQL injection vulnerability that will drop the database tables).

During the video, s/he makes mention that they will not visit the URL as it would cause trouble, but it is displayed so that the company they are responsibly disclosing to can remedy it. They upload it as unlisted to YouTube and submit their report. Five minutes later, Google-Youtube-Links comes along and sends two requests to the URL, triggering the SQL injection and rendering the site broken.

For this reason, using YouTube to host even private videos for security disclosures is not advisable, as the integrity of the disclosure cannot be assured with Google's search crawler probing inspected websites. It is difficult to completely fault Google for this activity, as malicious actors could use YouTube to instruct unwitting victims into manually typing links into their address bar, leading them to viruses or illicit content.

That said, the abject lack of documentation or acknowledgement from Google about this in public documentation should make users uneasy about how Google is using data uploaded to their services.

TechRepublic contacted Google, but did not receive a response by press time. We will update this story if Google provides a statement.

The big takeaways for tech leaders:

  • Google is using optical character recognition (OCR) techniques to crawl URLs found in YouTube videos, including unlisted and private videos.
  • Google's help pages for YouTube and Search Console make no mention of this behavior.

Also see

google.jpg
rvolkan, Getty Images