Date Added: Apr 2010
Systems for learning to detect anomalous email behavior, such as worms and viruses, tend to build either per user models or a single global model. Global models leverage a larger training corpus but often model individual users poorly. Per-user models capture fine grained behaviors but can take a long time to accumulate sufficient training data. Approaches that combine global and per-user information have the potential to address these limitations. The authors use the Latent Dirichlet Allocation model to transition smoothly from the global prior to a particular user's empirical model as the amount of user data grows.