Relieving the Data Acquisition Bottleneck in Word Sense Disambiguation
Supervised learning methods for WSD yield better performance than unsupervised methods. Yet the availability of clean training data for the former is still a severe challenge. This paper presents an unsupervised bootstrapping approach for WSD which exploits huge amounts of automatically generated noisy data for training within a supervised learning framework. The method is evaluated using the 29 nouns in the English Lexical Sample task of SENSEVAL2. The algorithm does as well as supervised algorithms on 31% of this test set, which is an improvement of 11% (absolute) over state-of-the-art bootstrapping WSD algorithms. The paper identifies seven different factors that impact the performance of the system.