Clustering Techniques for Establishing Inflectionally Similar Groups of Stems

Provided by: Machine Intelligence Research Labs (MIR Labs)
Topic: Big Data
Format: PDF
In this paper the authors present a hierarchical clustering algorithm aimed at creating groups of stems with similar characteristics. The resulting groups (clusters) are expected to comprise stems belonging to the same inflectional paradigm (e.g. verbs in passive voice) in order to support the creation of a morphological lexicon. A new metric for calculating the distance between the data objects is proposed, that better suits the specific application by addressing problems that may occur due to the limited amount of information from the data.

Find By Topic