Title: An improved TFIDF algorithm based on dual parallel adaptive computing model
Authors: Yuwan Gu; Yaru Wang; Juan Huan; Yuqiang Sun; Shoukun Xu
Addresses: School of Information Science and Engineering, Changzhou University, Changzhou, China ' School of Information Science and Engineering, Changzhou University, Changzhou, China ' School of Information Science and Engineering, Changzhou University, Changzhou, China ' School of Information Science and Engineering, Changzhou University, Changzhou, China ' School of Information Science and Engineering, Changzhou University, Changzhou, China
Abstract: The double parallel cloud computing framework based on graphics processing unit (GPU) and MapReduce is proposed. The method aims at the low efficiency for the large data sets on the stand-alone by text categorisation algorithm, constructs the adaptive computation process of double parallel computing and combines the advantage of improved term frequency-inverse document frequency (TFIDF) algorithm, and improves TFIDF text categorisation algorithm with double parallel adaptive computing. In different operating environments, the efficiency of improved TFIDF algorithm will be compared with different computing nodes. The result shows that the improved TFIDF based on dual parallel adaptation has an increase of 6.48% on Macro_F1 compared to the TFIDF based on CPU, and the operating efficiency has increased by nearly seven times. With the number of nodes increasing, the algorithm execution efficiency with double parallel adaptive computing is getting more and more effective.
Keywords: improved TFIDF algorithm; MapReduce; graphics processing unit; GPU; parallel computation.
International Journal of Embedded Systems, 2020 Vol.13 No.1, pp.18 - 27
Received: 01 Dec 2018
Accepted: 20 Jan 2019
Published online: 08 Jul 2020 *