Cluster labelling using chi-square-based keyword ranking and mutual information score: a hybrid approach Online publication date: Tue, 14-Mar-2017
by Rajendra Kumar Roul; Sanjay Kumar Sahay
International Journal of Intelligent Systems Design and Computing (IJISDC), Vol. 1, No. 1/2, 2017
Abstract: Cluster labelling is a technique which provides useful information about the cluster to the end users. In this paper, we propose a novel approach which is the follow-up of our previous work. Our earlier approach generates clusters of web documents by using a modified apriori approach which is more efficient and faster than the traditional apriori approach. To label the clusters, the proposed approach used an effective feature selection technique which selects the top features of a cluster. Rather than labelling the cluster with 'bag of words', a concept driven mechanism has been developed which uses the Wikipedia that takes the top features of a cluster as input to generate the possible candidate labels. Mutual information (MI) score technique has been used for ranking the candidate labels and then the topmost candidates are considered as potential labels of a cluster. Experimental results on two benchmark datasets demonstrate the efficiency of our approach.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Intelligent Systems Design and Computing (IJISDC):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com