Whether by design or not, website privacy policies are confusing, convoluted, and long–all reasons why people do not read them. But is that wise?
According to a team of researchers from Ecole Polytechnique Federale de Lausanne (EPFL), the University of Wisconsin, and the University of Michigan, not understanding privacy policies is flirting with digital danger. In their co-authored report Polisis: Automated Analysis and Presentation of Privacy Policies Using Deep Learning, Hamza Harkous (EPFL), Kassem Fawaz (Wisconsin), Remi Lebret (EPFL), Florian Schaub (Michigan), Kang G. Shin (Michigan), and Karl Aberer (EPFL) write:
“Privacy policies are the primary channel through which companies inform users about their data collection and sharing practices. In their current form, policies remain long and difficult to comprehend, thus merely serving the goal of legally protecting the companies.”
If that is the case, we as users either have to trust the organization issuing the privacy policy or read and understand every online privacy policy we encounter. Privacy experts Aleecia M. McDonald and Lorrie Faith Cranor, way back in 2008, estimated that the average user would require over 200 hours to read all the privacy policies happened upon in a single year.
Thankfully, users now have another option. Harkous, Fawaz, Lebret, Schaub, Shin, and Aberer pooled their expertise to create Polisis (privacy-POLIcy-analySIS), a software program designed to find and flag conditions specific to user privacy and personal-data usage. Polisis employs artificial intelligence (AI) to wade through all the daunting legalese quickly; the software is free to use, and it’s available as a Chrome extension, a Firefox extension, and online at the research team’s website PriBot.org.
SEE: IT leader’s guide to the future of artificial intelligence (Tech Pro Research)
Scalability is a challenge with privacy policies
The researchers spent time looking at what already exists and determined that due to the amount of information, scalability is a huge issue. To enable scaling, the team members decided to divide each privacy policy into small, self-contained fragments of text. Polisis then adds labels describing user-privacy content to the appropriate text segment.
“Unlike previous research in automatic labeling/analysis of privacy policies, we did not design Polisis to just predict a handful of classes given the entire policy content,” explains the authors in their report. “Instead, Polisis predicts for each segment the set of classes that account for both the high-level aspects and the fine-grained classes of embedded privacy information.”
Figure A is a high-level view of Polisis. The important piece according to the research team is that Polisis’ granularity allows scalable queries that are not possible using other methods.
Figure A

Polisis is a user-friendly app
As to user friendliness, project lead Hamza Harkous states:
“Our program employs simple graphs and color codes to show people exactly how their data could be used. For instance, some websites share geolocation data for marketing purposes, while others may not fully protect information about children. Such clauses are typically buried deep in their data-protection policies.”
Figure B depicts the team’s web app displaying the results produced by Polisis.
Figure B

The app shows the flow of the data being collected, the reasons behind that, and the choices given to the user in the privacy policy. The user can check the policy statements for each type of information, reason, and option by hovering over it.
SEE: Artificial intelligence and privacy engineering: Why it matters NOW (ZDNet)
Ask PriBot about a site’s data-protection policy
Besides static queries, the researchers developed PriBot, an online chatbot that accepts questions (currently only in English) about a website’s data-protection policy, such as: Does it share my credit-card information? Besides reducing process time, AI allows PriBot to answer what the report calls non-factoid questions.
“Over the past few years, deep learning has yielded superior results to traditional retrieval techniques in this domain,” add Harkous, Fawaz, Lebret, Schaub, Shin, and Aberer. “Our main contribution is that we build a QA system, without a dataset that includes questions and answers, while achieving results on par with state of the art tools used by other domains.”
SEE: How to implement AI and machine learning (ZDNet special report) | Download the report as a PDF (TechRepublic)
These privacy policy tools are not perfect
Polisis and PriBot sound like useful tools that let users decide for themselves whether the website’s content is worth what they give up in privacy. That is predicated on Polisis and PriBot being accurate–the authors speak to that point in the report.
“While PriBot, like Polisis, is not perfect–their results are for information only and offer no legal guarantee–it gives the right answer around 82 percent of the time,” the researchers report. “A respectable score that could make it, along with its sister Polisis, extremely useful for consumers as well as journalists, researchers, and data protection watchdogs.”
SEE: Essential reading for IT leaders: 10 books on cybersecurity (free PDF) (TechRepublic)
Users have choices
According to Harkous, Fawaz, Lebret, Schaub, Shin, and Aberer, we do, in fact, have choices when it comes to privacy policies, and now there are ways to determine the best possible one.
Harkous said the team is not finished yet; they intend to develop an alert system that notifies users of any unexpected use of their data and create a system for ranking services and connected objects according to their data-protection policies.
