IRILD: An Information Retrieval Based Method for Information Leak Detection
The traditional approach for detecting information leaks is to generate fingerprints of sensitive data, by partitioning and hashing it, and then comparing these fingerprints against outgoing documents. Unfortunately, this approach incurs a high computation cost as every part of document needs to be checked. As a result, it is not applicable to systems with a large number of documents that need to be protected. Additionally, the approach is prone to false positives if the fingerprints are common phrases. In this paper, the authors propose an improvement for this approach to offer a much faster processing time with less false positives. The core idea of their solution is to eliminate common phrases and non-sensitive phrases from the fingerprinting process.