Data Management

Utilizing High-quality Feature Extension Mode to Classify Chinese Short-text

Download Now Free registration required

Executive Summary

This paper presents a method of classifying Chinese short-texts that have weak concept signal, in which high-quality feature extension modes are extracted and used effectively. In the method, a feature extension mode is considered as a set of terms that have co-occurrence relationship in the training data, and three measures that decide whether it is high-quality, i.e., Confidence, category homoplasy and relevancy strength, are presented. Then, an algorithm, which extracts high-quality feature extension modes from training data, is designed. Next, Chinese short-text classification algorithm utilizing feature extension modes is presented, in which a short-text is extended by adding new features or modifying the weights of initial features, according to the relationship between non-feature term and feature extension mode.

  • Format: PDF
  • Size: 435.66 KB