Big Data

Digging up Social Structures from Documents on the Web

Date Added: Nov 2012
Format: PDF

The authors collected more than ten million Microsoft Office documents from public websites, analyzed the meta data stored in each document and extracted information related to social activities. Their analysis revealed the existence of exactly identified cliques of users that edit, revise and collaborate on industrial and military content. They also examined cliques in documents downloaded from Fortune-500 company websites. They constructed their graphs and measured their properties. The graphs contained many connected components and presented social properties. The a priori knowledge of a company's social graph may significantly assist an adversary to launch targeted attacks, such as targeted advertisements and phishing emails. Their study demonstrates the privacy risks associated with meta data by cross-correlating all members identified in a clique with users of Twitter.