Title: Modelling on microblog posts clustering based on iteration feature selection and abstractive summarisation
Authors: Kai Gao; Bao-quan Zhang
Addresses: School of Information Science and Engineering, Hebei University of Science and Technology, Hebei 050018, China ' School of Information Science and Engineering, Hebei University of Science and Technology, Hebei 050018, China
Abstract: With the coming of big data era, data mining and intelligent processing become more and more important, and modelling on novel big data processing is necessary. As micro-blog posts' properties on short texts and the linguistic unreliable features, it is necessary to analyse and cluster these similar posts together for further data mining and recommendation. This paper uses the classical clustering algorithm of k-means, and then presents a novel modelling approach to partition the micro-blog posts into the corresponding k similar groups. Furthermore, a feature selection model based on 2-phase iteration is proposed. Based on this model, a clustering algorithm is presented. The proposed algorithm takes use of the partition idea and avoids the influence of the outliers or noise data. Lastly, a proposed cluster abstractive summarisation approach is presented to summarise every individual cluster. On the basis of this, it is easy for users to know the main content about a cluster. Experiment shows the feasibility of the approach, and some existing problems and further works are also presented in the end.
Keywords: microblogging; data mining; feature selection; text clustering; similarity; modelling; microblog posts; abstractive summarisation; big data; k-means clustering; iteration.
DOI: 10.1504/IJMIC.2015.071886
International Journal of Modelling, Identification and Control, 2015 Vol.24 No.2, pp.110 - 119
Received: 13 Jan 2015
Accepted: 17 Feb 2015
Published online: 22 Sep 2015 *