How much can we really know about someone, based on their 140 characters? In a study published on Tuesday in the journal Social Psychological and Personality Science, researchers at the University of Pennsylvania, the Technical University of Darmstadt, and the University of Melbourne examined this question, digging into how stereotypes influence what we think about someone based on their tweets.

In a series of four studies, 3,000 participants guessed the gender, age, education, and politics of 6,000 tweeters, by looking at 20 publicly-available tweets. The tweets were stripped of images or any other markers that might indicate demographics. The researchers asked each participant–who had enrolled in the study via Amazon Mechanical Turk–to look at a dozen or so tweets, and make a judgment about the tweeter using one of the four variables. Participants were only asked to judge one marker, in order to prevent them from being influenced by the other answers.

“We reversed the problem,” said Daniel Preotiuc-Pietro, a researcher at the University of Pennsylvania. Instead of asking people about their stereotypes, which is the normal method, “we wanted to do this in the wild,” he said. When you ask people to name their biases, “people might not be aware of them, or want to present themselves as unbiased,” he said.

By using natural language processing, researchers could separate out the stereotypes. The aim was to figure out how stereotyping impacted judgments.

Participants were correct in their judgements, on average, 68% of the time. Here’s what they found:

  • Gender: 76% of the guesses were correct.
  • Age: 69% predicted ‘younger than 24 vs. older than 24’ correctly
  • Political orientation: 82% judged liberal vs. conservative correctly.
  • Education: Only 45.5% judged correctly out of three choices–no bachelor’s degree, bachelor’s degree, and advanced degree.

The main takeaway? Participants were mostly right, but their stereotypes were exaggerated, Preotiuc-Pietro said. So, for instance, when swearing was used, participants judged that the tweet came from a lower-educated person. But they then used that logic to apply across the board, so they missed many cases in which that language came from people with advanced degrees.

Also, the characteristics ascribed to one variable often impacted the other variables. Machine learning algorithms trained on, for example, tweets by women, to “learn” the characteristics of feminine-sounding tweets. As it turned out, when participants made decisions in other areas, like in judging political orientation, these cues influenced the outcome–so feminine-sounding tweets were marked as liberal, as well, and masculine-sounding ones were judged to be conservative.

SEE: Myth busted: Older workers are just as tech-savvy as younger ones, says new survey

The subject matter itself also led participants to stereotypes. Technology-related language, for instance, led participants to guess a male had penned the tweet–which was, mostly, true. Still, judging it to be always true led to participants missing cases where women wrote about technology.

Also, it’s not as if some participants used stereotypes and others did–across the board, everyone displayed some sort of bias.

The findings have elicited further questions that the researchers are looking into, such as “Are women better at identifying other women?” the results of which are currently under review.

The important point, said Preotiuc-Pietro, is to figure out ways to combat stereotypes. And how to do this? “Making people aware of their stereotypes towards certain groups so it can be intervened upon,” Preotiuc-Pietro said. “If we can educate people about the ways these beliefs can steer them wrong, it will make people more socially accurate both online and off.”

Also see…