Download now Free registration required
Recently, the use of corpus-based techniques to compare language usage has seen an increase. This paper presents a method of comparing corpora using frequency profiling. Through this method, one can discover key words in the corpora to differentiate one corpus from another. Further, annotated corpora can be applied to discover key grammatical or word-sense categories. Using this method, one can quickly find the differences between the corpora. The paper also lists many uses and applications of the frequency profiling method. These include study of social differentiation in the use of English vocabulary, profiling of learner English and document analysis in the software engineering process. The paper also shows the application of this method at the word level, part-of-speech tag level, and semantic tag level. The paper also describes the future field of study that can be researched using this method. This includes a more precise specification of the reliability of the statistical tests (LL, Pearson~ X 2 and others) under the effects of corpus size, ratio of the corpora being compared and word (or tag) frequency. The method proposed by the paper is not completely automated. It only suggests a group of key items by decreasing order of significance which distinguishes one corpus from another. The researcher can use this information to investigate occurrences of the significant items in the corpora using standard corpus techniques such as KWIC (key-word in context).
- Format: PDF
- Size: 180.3 KB