University of Illinois at Urbana Champaign
Document representation is a fundamental problem for text mining. Many efforts have been done to generate concise yet semantic representation, such as bag-of-words, phrase, sentence and topic-level descriptions. Nevertheless, most existing techniques counter difficulties in handling monolingual comparable corpus, which is a collection of monolingual documents conveying the same topic. In this paper, the authors propose the use of frame, a high-level semantic unit, and construct frame-based representations to semantically describe documents by bags of frames, using an information network approach.