Learn how spam filtering works: Explore tokenization, the building blocks of spam
Source: No Starch Press
Learn how spam filtering works and how language classification and machine learning combine to produce remarkably accurate spam filters in Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. The book describes, in depth, how statistical filtering is being used by next-generation spam filters to identify and filter spam. In this sample chapter, explore tokenization, the process of reducing a message to its colloquial components (individual words, word pairs, or other small chunks of text), and also the building blocks of spam.
Title: Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Author: Jonathan A. Zdziarski
Publisher: No Starch Press
Chapter 6: Tokenization: The Building Blocks of Spam
ISBN: 1-593270-52-6; Copyright © 2005 No Starch Press. All rights reserved.
Used with permission from the publisher. Available from booksellers or direct from No Starch Press