Image: Generated by Google’s Nano Banana
An unsecured database exposed 4.3 billion LinkedIn-derived records, enabling large-scale phishing and identity-based attacks.
A massive, unsecured database containing billions of professional profiles has been left exposed online, creating one of the largest known leaks of lead-generation data to date.
The dataset — spanning more than 16 terabytes — includes LinkedIn-derived information, contact details, and corporate intelligence that could fuel large-scale phishing, fraud, and reconnaissance campaigns if abused.
“Large datasets like this one are a prime target for malicious actors, as they act as a strong foundational base for profile enrichment and targeted attacks,” Cybernews researchers wrote in a blog post.
The exposure highlights how aggregation itself becomes the primary risk, as consolidating billions of public profiles into a single searchable database sharply lowers the barrier for targeted attacks.
While individual data points may seem low risk alone, aggregating them at scale enables attackers to quickly identify high-value targets and craft convincing social engineering campaigns.
For security teams, this shifts the threat model away from purely technical exploits toward identity-centric abuse, where attackers rely on context and credibility rather than malware to achieve their objectives.
Cybernews researchers discovered an unprotected MongoDB instance containing approximately 4.3 billion records and 16.14 TB of data, placing it among the largest unsecured lead-generation datasets ever identified.
The dataset’s size, structure, and freshness make it well-suited for automated phishing, executive impersonation, and large-scale enterprise reconnaissance.
The exposed database consisted of nine structured MongoDB collections, several of which contained extensive personally identifiable information tied to real individuals.
At least three collections — profiles, unique_profiles, and people — held sensitive data, with one collection alone containing more than 732 million unique records, including associated photographs.
The exposed fields included full names, email addresses, phone numbers, and LinkedIn URLs and profile handles. Additional data covered job titles, employment histories, education records, skills, location information, and linked social media accounts.
Some records also contained enrichment metadata such as email confidence scoring and an Apollo ID, indicating integration with sales intelligence platforms used by marketing and business development teams.
While records within individual collections appeared unique, researchers noted potential overlap across collections, and timestamps and schema consistency indicate the data was likely collected or updated within the past two years across multiple geographic regions.
The exposure appears to stem from a common issue: a misconfigured MongoDB database left publicly accessible due to human error rather than sophisticated intrusion.
Because the dataset reflects automated LinkedIn-style scraping and enrichment, researchers believe the data is accurate and highly valuable for targeted phishing, fraud, and reconnaissance.
When attackers have access to detailed professional profiles, phishing, impersonation, and account takeover attempts become far more effective.
To counter these risks, organizations must focus on protecting identities, detecting abnormal behavior, and limiting blast radius when credentials are compromised:
Combined, these steps strengthen organizational resilience against data-fueled threat campaigns.
Editor’s note: This article first appeared on our sister publication, eSecurityPlanet.com.