Binary Information Press
In the preprocessing stage of the state-of-art multi-document summarizers, an empirically selected frequency threshold is needed to filter out noise words. The best value of this parameter varies in different summarization document sets and depends seriously on summarization algorithms. Therefore, it currently stays to be a manually selected parameter. In this paper, the authors propose a more robust noise filter based on a supervised binary classifier for words, by fully utilizing the manually written summaries for word labeling and feature analysis.