Date Added: Jan 2010
NLP systems for tasks such as question answering and information extraction typically rely on statistical parsers. But the efficacy of such parsers can be surprisingly low, particularly for sentences drawn from heterogeneous corpora such as the web. The web-based semantic filtering is based on this concept. The fundamental hypothesis is that incorrect parses often result in wildly implausible semantic interpretations of sentences, which can be detected automatically in certain circumstances. NLP literature is replete with examples of systems that produce semantic interpretations and use semantics to improve understanding. Several systems in the 1970s and 1980s used hand-built, augmented transition networks or semantic networks to prune bad semantic interpretations. The implementation of semantic consists of two components - a semantic interpreter that takes a parse tree and converts it to a conjunction of first-order predicates, and a sequence of four increasingly sophisticated methods that check semantic possibility of conjuncts on the Web. Traditional statistical parsers also use co occurrence of lexical heads as features for making parse decisions. This paper expands on the idea in two ways. Woodward constructs a representation that identifies the key semantic relationships implicit in the parse. It then uses a set of web-based sampling techniques to check whether these relationships are plausible. If any relationship is highly implausible, Woodward concludes the phrase as incorrect.