Integrating Instance-level and Attribute-level Knowledge into Document Clustering

Jinlong Wang, Shunyao Wu, Gang Li, Zhe Wei

In this paper, we present a document clustering framework incorporating instance-level knowledge in the form of pairwise constraints and attribute-level knowledge in the form of keyphrases. Firstly, we initialize weights based on metric learning with pairwise constraints, then simultaneously learn two kinds of knowledge by combining the distance-based and the constraint-based approaches, finally evaluate and select clustering result based on the degree of users’ satisfaction. The experimental results demonstrate the effectiveness and potential of the proposed method.