Learn how spam filtering works: Explore tokenization, the building blocks of spam

Source: No Starch Press

Favorite

Free registration required


Learn how spam filtering works and how language classification and machine learning combine to produce remarkably accurate spam filters in Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification. The book describes, in depth, how statistical filtering is being used by next-generation spam filters to identify and filter spam. In this sample chapter, explore tokenization, the process of reducing a message to its colloquial components (individual words, word pairs, or other small chunks of text), and also the building blocks of spam.

Title: Ending Spam: Bayesian Content Filtering and the Art of Statistical Language Classification
Author: Jonathan A. Zdziarski
Publisher: No Starch Press
Chapter 6: Tokenization: The Building Blocks of Spam
ISBN: 1-593270-52-6; Copyright © 2005 No Starch Press. All rights reserved.
Used with permission from the publisher. Available from booksellers or direct from No Starch Press
Format:PDF Size:612.00
Version:1.0 Date:May 2007
Price:0.00 Downloads:763