Whether by design or not, website privacy policies are confusing, convoluted, and long—all reasons why people do not read them. But is that wise?
According to a team of researchers from Ecole Polytechnique Federale de Lausanne (EPFL), the University of Wisconsin, and the University of Michigan, not understanding privacy policies is flirting with digital danger. In their co-authored report Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning, Hamza Harkous (EPFL), Kassem Fawaz (Wisconsin), Remi Lebret (EPFL), Florian Schaub (Michigan), Kang G. Shin (Michigan), and Karl Aberer (EPFL) write:
"Privacy policies are the primary channel through which companies inform users about their data collection and sharing practices. In their current form, policies remain long and difficult to comprehend, thus merely serving the goal of legally protecting the companies."
Thankfully, users now have another option. Harkous, Fawaz, Lebret, Schaub, Shin, and Aberer pooled their expertise to create Polisis (privacy-POLIcy-analySIS), a software program designed to find and flag conditions specific to user privacy and personal-data usage. Polisis employs artificial intelligence (AI) to wade through all the daunting legalese quickly; the software is free to use, and it's available as a Chrome extension, a Firefox extension, and online at the research team's website PriBot.org.
SEE: IT leader's guide to the future of artificial intelligence (Tech Pro Research)
Scalability is a challenge with privacy policies
"Unlike previous research in automatic labeling/analysis of privacy policies, we did not design Polisis to just predict a handful of classes given the entire policy content," explains the authors in their report. "Instead, Polisis predicts for each segment the set of classes that account for both the high-level aspects and the fine-grained classes of embedded privacy information."
Figure A is a high-level view of Polisis. The important piece according to the research team is that Polisis' granularity allows scalable queries that are not possible using other methods.
Polisis is a user-friendly app
As to user friendliness, project lead Hamza Harkous states:
"Our program employs simple graphs and color codes to show people exactly how their data could be used. For instance, some websites share geolocation data for marketing purposes, while others may not fully protect information about children. Such clauses are typically buried deep in their data-protection policies."
Figure B depicts the team's web app displaying the results produced by Polisis.
Ask PriBot about a site's data-protection policy
Besides static queries, the researchers developed PriBot, an online chatbot that accepts questions (currently only in English) about a website's data-protection policy, such as: Does it share my credit-card information? Besides reducing process time, AI allows PriBot to answer what the report calls non-factoid questions.
"Over the past few years, deep learning has yielded superior results to traditional retrieval techniques in this domain," add Harkous, Fawaz, Lebret, Schaub, Shin, and Aberer. "Our main contribution is that we build a QA system, without a dataset that includes questions and answers, while achieving results on par with state of the art tools used by other domains."
Polisis and PriBot sound like useful tools that let users decide for themselves whether the website's content is worth what they give up in privacy. That is predicated on Polisis and PriBot being accurate—the authors speak to that point in the report.
"While PriBot, like Polisis, is not perfect—their results are for information only and offer no legal guarantee—it gives the right answer around 82 percent of the time," the researchers report. "A respectable score that could make it, along with its sister Polisis, extremely useful for consumers as well as journalists, researchers, and data protection watchdogs."
SEE: Essential reading for IT leaders: 10 books on cybersecurity (free PDF) (TechRepublic)
Users have choices
According to Harkous, Fawaz, Lebret, Schaub, Shin, and Aberer, we do, in fact, have choices when it comes to privacy policies, and now there are ways to determine the best possible one.
Harkous said the team is not finished yet; they intend to develop an alert system that notifies users of any unexpected use of their data and create a system for ranking services and connected objects according to their data-protection policies.
- How to make confusing privacy policies usable (TechRepublic)
- The undercover war on your internet secrets: How online surveillance cracked our trust in the web (TechRepublic cover story)
- Understanding the differences between AI, machine learning, and deep learning (TechRepublic)
- Your life in AI's hands: The battle to understand deep learning (TechRepublic)
- IT leader's guide to deep learning (Tech Pro Research)