Institute of Electrical & Electronic Engineers
In this paper, the authors investigate the problem of quality analysis of clustering results using semantic annotations given by experts. They propose a novel approach to construction of evaluation measure, which is based on the Minimal Description Length (MDL) principle. In fact this proposed measure, called SEE (Semantic Evaluation by Exploration), is an improvement of the existing evaluation methods such as rand index or normalized mutual information. It fixes some of weaknesses of the original methods. They illustrate the proposed evaluation method on the freely accessible biomedical research papers from Pub-Med Central (PMC).