SHARE

White Hat Hackers Discover Microsoft Leak of 38TB of Internal Data Via Azure Storage

The Microsoft leak, which stemmed from AI researchers sharing open-source training data on GitHub, has been mitigated.

Written By

Sep 19, 2023

Microsoft has patched a vulnerability that exposed 38TB of private data from its AI research division. White hat hackers from cloud security company Wiz discovered a shareable link based on Azure Statistical Analysis System tokens on June 22, 2023. The hackers reported it to the Microsoft Security Response Center, which invalidated the SAS token by June 24 and replaced the token on the GitHub page, where it was originally located, on July 7.

Jump to:

SAS tokens, an Azure file-sharing feature, enabled this vulnerability
What businesses can learn from the Microsoft data leak

SAS tokens, an Azure file-sharing feature, enabled this vulnerability

The hackers first discovered the vulnerability as they searched for misconfigured storage containers across the internet. Misconfigured storage containers are a known backdoor into cloud-hosted data. The hackers found robust-models-transfer, a repository of open-source code and AI models for image recognition used by Microsoft’s AI research division.

The vulnerability originated from a Shared Access Signature token for an internal storage account. A Microsoft employee shared a URL for a Blob store (a type of object storage in Azure) containing an AI dataset in a public GitHub repository while working on open-source AI learning models. From there, the Wiz team used the misconfigured URL to acquire permissions to access the entire storage account.

Must-read security coverage

When the Wiz hackers followed the link, they were able to access a repository that contained disk backups of two former employees’ workstation profiles and internal Microsoft Teams messages. The repository held 38TB of private data, secrets, private keys, passwords and the open-source AI training data.

SAS tokens don’t expire, so they aren’t typically recommended for sharing important data externally. A September 7 Microsoft security blog pointed out that “Attackers may create a high-privileged SAS token with long expiry to preserve valid credentials for a long period.”

Microsoft noted that no customer data was ever included in the information that was exposed, and that there was no risk of other Microsoft services being breached because of the AI data set.

What businesses can learn from the Microsoft data leak

This case isn’t specific to the fact that Microsoft was working on AI training — any very large open-source data set might conceivably be shared in this way. However, Wiz pointed out in its blog post, “Researchers collect and share massive amounts of external and internal data to construct the required training information for their AI models. This poses inherent security risks tied to high-scale data sharing.”

Wiz suggested organizations looking to avoid similar incidents should caution employees against oversharing data. In this case, the Microsoft researchers could have moved the public AI data set to a dedicated storage account.

Organizations should be alert for supply chain attacks, which can occur if attackers inject malicious code into files that are open to public access through improper permissions.

SEE: Use this checklist to make sure you’re on top of network and systems security (TechRepublic Premium)

“As we see wider adoption of AI models within companies, it’s important to raise awareness of relevant security risks at every step of the AI development process, and make sure the security team works closely with the data science and research teams to ensure proper guardrails are defined,” the Wiz team wrote in their blog post.

Ami Luttwak, CTO and cofounder of Wiz, released the following statement to TechRepublic: “As AI adoption increases, so does data sharing. AI is built on collecting and sharing lots of large models and quantities of data, and so what happens is you get high volumes of information flowing between teams. This incident reveals the importance of sharing data in a secure manner. Wiz also recommends security teams gain more visibility into the process of AI research, and work closely with their development counterparts to address risks early and set guardrails.”

When asked for comment, Microsoft directed TechRepublic to their Security Response Center post.

Megan Crouse

Megan Crouse has a decade of experience in business-to-business news and feature writing, including as first a writer and then the editor of Manufacturing.net. Her news and feature stories have appeared in Military & Aerospace Electronics, Fierce Wireless, TechRepublic, and eWeek. She copyedited cybersecurity news and features at Security Intelligence. She holds a degree in English Literature and minored in Creative Writing at Fairleigh Dickinson University.