SHARE

4.3B LinkedIn-Style Records Found in One of the Largest Data Exposures Ever

Image: Generated by Google’s Nano Banana

An unsecured database exposed 4.3 billion LinkedIn-derived records, enabling large-scale phishing and identity-based attacks.

Written By

Ken Underhill

Dec 16, 2025

We may earn from vendors via affiliate links or sponsorships. This might affect product placement on our site, but not the content of our reviews. See our Terms of Use for details.

A massive, unsecured database containing billions of professional profiles has been left exposed online, creating one of the largest known leaks of lead-generation data to date.

The dataset — spanning more than 16 terabytes — includes LinkedIn-derived information, contact details, and corporate intelligence that could fuel large-scale phishing, fraud, and reconnaissance campaigns if abused.

“Large datasets like this one are a prime target for malicious actors, as they act as a strong foundational base for profile enrichment and targeted attacks,” Cybernews researchers wrote in a blog post.

How aggregated data fuels targeted attacks

The exposure highlights how aggregation itself becomes the primary risk, as consolidating billions of public profiles into a single searchable database sharply lowers the barrier for targeted attacks.

While individual data points may seem low risk alone, aggregating them at scale enables attackers to quickly identify high-value targets and craft convincing social engineering campaigns.

For security teams, this shifts the threat model away from purely technical exploits toward identity-centric abuse, where attackers rely on context and credibility rather than malware to achieve their objectives.

Cybernews researchers discovered an unprotected MongoDB instance containing approximately 4.3 billion records and 16.14 TB of data, placing it among the largest unsecured lead-generation datasets ever identified.

The dataset’s size, structure, and freshness make it well-suited for automated phishing, executive impersonation, and large-scale enterprise reconnaissance.

Inside the 4.3 billion-record data exposure

The exposed database consisted of nine structured MongoDB collections, several of which contained extensive personally identifiable information tied to real individuals.

At least three collections — profiles, unique_profiles, and people — held sensitive data, with one collection alone containing more than 732 million unique records, including associated photographs.

The exposed fields included full names, email addresses, phone numbers, and LinkedIn URLs and profile handles. Additional data covered job titles, employment histories, education records, skills, location information, and linked social media accounts.

Some records also contained enrichment metadata such as email confidence scoring and an Apollo ID, indicating integration with sales intelligence platforms used by marketing and business development teams.

While records within individual collections appeared unique, researchers noted potential overlap across collections, and timestamps and schema consistency indicate the data was likely collected or updated within the past two years across multiple geographic regions.

The exposure appears to stem from a common issue: a misconfigured MongoDB database left publicly accessible due to human error rather than sophisticated intrusion.

Because the dataset reflects automated LinkedIn-style scraping and enrichment, researchers believe the data is accurate and highly valuable for targeted phishing, fraud, and reconnaissance.

How to reduce risk from identity-based threats

When attackers have access to detailed professional profiles, phishing, impersonation, and account takeover attempts become far more effective.

To counter these risks, organizations must focus on protecting identities, detecting abnormal behavior, and limiting blast radius when credentials are compromised:

Harden email security with behavioral analysis and impersonation detection to stop highly personalized phishing attempts.
Enforce phishing-resistant MFA and least-privilege access to reduce the impact of credential exposure.
Monitor identity, SaaS, and network activity for credential abuse, anomalous logins, and behavior inconsistent with normal user patterns.
Apply conditional access policies and device posture checks to limit access following risky or suspicious activity.
Audit third-party vendors and prepare identity-focused incident response playbooks for rapid credential rotation and containment.

Combined, these steps strengthen organizational resilience against data-fueled threat campaigns.

Editor’s note: This article first appeared on our sister publication, eSecurityPlanet.com.

Ken Underhill

Ken Underhill is an award-winning cybersecurity professional, bestselling author, and seasoned IT professional. He holds a graduate degree in cybersecurity and information assurance from Western Governors University and brings years of hands-on experience to the field.

Protect your organization with security intelligence trusted by 50K+ subscribers.

Must-know cybersecurity news, expert insights, and practical guidance to help organizations protect their systems, teams, and reputations. Delivered daily.