Provided by: Association for Computing Machinery
Topic: Big Data
Although there is a large and growing literature that tackles the semi-supervised clustering problem (i.e., using some labeled objects or cluster-guiding constraints like \"Must-link\" or \"Cannot-link\"), the evaluation of semi-supervised clustering approaches has rarely been discussed. The application of cross-validation techniques, for example, is far from straightforward in the semi-supervised setting, yet the problems associated with evaluation have yet to be addressed. Here, the authors summarize these problems and provide a solution. Furthermore, in order to demonstrate practical applicability of semi-supervised clustering methods, they provide a method for model selection in semi-supervised clustering based on this sound evaluation procedure.