Fraud investigators from credit card companies used to call cardholders and ask if perchance they made such and such a purchase. Now when investigators call, it’s to inform cardholders their credit card information has been stolen and the account frozen, as fraudsters are trying to charge purchases to their account.

That capability might seem Big Brother enough to concern aficionados of George Orwell. Alternatively, Jungwoo Ryoo, associate professor of information sciences and technology at the Pennsylvania State University suggests, “From the consumers’ perspective, fraud detection can seem magical.”

SEE: Identity Theft Protection Policy (Tech Pro Research download)

“The process appears instantaneous, with no human beings in sight,” writes Ryoo in Machine learning and big data know it wasn’t you who just swiped your credit card, an essay for The Conversation. “This apparently seamless and instant action involves a number of sophisticated technologies in areas ranging from finance and economics to law and information sciences.”

Ryoo mentions it has not always been this way. When identity fraud first came into its own, detecting instances of fraud required humans reviewing suspicious transactions flagged by detection algorithms. And, if any doubt cropped up, investigators would call cardholders. “This still happens today although there is a difference in terms of accuracy and scale,” adds Ryoo. “The algorithms in use today can handle more data and faster; making the job of fraud detection less labor-intensive and more accurate. Humans are still in the loop in a number of the fraud detection cases.”

With the commercial success of online merchandising, it became impossible to review every suspect transaction. Case in point, Kim S. Nash in her Wall Street Journal column mentions PayPal processes more than 1.1 petabytes (equivalent to 200,000 DVDs) of data.

Magic or machine learning?

Ryoo says machine learning is the reason credit card companies can respond in near real-time. As to defining machine learning, that might be a bit tricky. There are an inordinate number of definitions for machine learning floating around the internet. For our purposes, the explanation championed by the Machine Learning Department at Carnegie Mellon University applies nicely: “Machine Learning is a scientific field addressing the question ‘How can we program systems to automatically learn and to improve with experience?'”

The website then explains how machine learning is accomplished:

“To tackle these problems we develop algorithms that discover general conjectures and knowledge from specific data and experience, based on sound statistical and computational principles. We also develop theories of learning processes that characterize the fundamental nature of the computations and experience sufficient for successful learning in machines and humans.”

The Carnegie Mellon description mentions learning. Ryoo relates that to fraud detection. “A machine learning algorithm for fraud detection needs to be trained first by being fed the normal transaction data of lots and lots of cardholders,” he writes. “Transaction sequences are an example of this kind of training data. A person may typically pump gas one time a week, go grocery shopping every two weeks, and so on. The algorithm learns that this is a normal transaction sequence.”

If developing a machine learning algorithm sounds complex, it is. In Nash’s WSJ piece, she writes that PayPal, between 2008 and 2009, tested several fraud detection packages. None of the platforms were able to provide correct analysis fast enough.

Seeing no other solution, in 2009, PayPal began building its fraud analysis systems incorporating new open-source technologies. “The company uses Hadoop to store data, and related analytics tools, such as the Kraken,” explains Nash. “A data warehouse from Teradata Corp. stores structured data. The PayPal platforms run on both grid and cloud computing infrastructures.”

Ryoo understands why the people at PayPal were unable to use off-the-shelf packages. To analyze that much data accurately, reliably, and fast enough requires significant fine tuning, and most likely the fraud detection packages tested by PayPal analysts were proprietary.

Test in real time

Once a fraud detection system is in place and fine-tuned, it’s time to test. “Credit card transactions are run through the algorithm, ideally in real time,” says Ryoo. “It then produces a probability number indicating the possibility of a transaction being fraudulent (for instance, 97%). If the fraud detection system is configured to block any transactions whose score is above, say, 95%, this assessment could immediately trigger a card rejection at the point of sale.”

Some factors the algorithm checks:

  • trustworthiness of the vendor;
  • cardholder’s purchasing behavior including time, locations, and IP addresses;
  • buying history;
  • recent activity at the merchant’s site; and
  • information stored in cookies.

What Ryoo says next is significant, “This process makes just-in-time or near real-time fraud detection possible. No person can evaluate thousands of data points simultaneously and make a decision in a split second.”

Making real-time fraud detection decisions at the “point of sale” would be a huge improvement when one considers the current costs of identity fraud. Analysts at JAVELIN have been tracking the cost of identity fraud since 2003. The company’s 2015 Identity Fraud Report (purchase required) states 12.7 million people in the US alone were victimized (one every two seconds) with the bad guys netting $16 billion.

New regulations say some entity — whether it be the credit card company or merchants — other than the consumer pays for identity fraud, but those costs ultimately trickle down to us, the credit card users. Let’s hope advances in machine learning make identity fraud a thing of the past.