Nonparametric Detection of Anomalous Data via Kernel Mean Embedding
An anomaly detection problem is investigated, in which there are totally n sequences with s anomalous sequences to be detected. Each normal sequence contains m independent and identically distributed (i.i.d.) samples drawn from a distribution p, whereas each anomalous sequence contains m i.i.d. samples drawn from a distribution q that is distinct from p. The distributions p and q are assumed to be unknown a priori. Two scenarios, respectively with and without a reference sequence generated by p, are studied. Distribution-free tests are constructed using Maximum Mean Discrepancy (MMD) as the metric, which is based on mean embeddings of distributions into a Reproducing Kernel Hilbert Space (RKHS).