Title: Distributed data mining in grid computing environment
Authors: Jianlan Ren; Zhongsheng Chen; Zheng Zhang
Addresses: Jiangxi V&T College of Communications, Nanchang 330013, Jiangxi, China ' College of Land and Resources, China West Normal University, Nanchong 637002, Sichuan, China ' Jiangxi V&T College of Communications, Nanchang 330013, Jiangxi, China
Abstract: With the rapid development of computer technology, the data generated in the scientific research, industrial and commercial fields is increasing at an alarming rate. Traditional data mining techniques are limited to mining a single data source. How to mine distributed data sources and how to perform parallel mining is one of the hot topics in the field of data mining. The purpose of this article is to study distributed data mining in a grid computing environment. This paper studies the existing grid technology and data mining technology, and discusses the possibility of combining the two. Then based on this, a grid-based distributed data mining service framework is proposed, and the service framework is developed detailed design. This paper tests the framework, the experimental results show that applying the grid framework to distributed mining can improve the computing performance and data size. In this paper, the calculation speedup of the framework under 1 to 8 nodes is tested, and the speedup ratios are 1, 2, 3, 4, 5, 6, 7, and 8 respectively. It can be seen that the performance of the framework is directly proportional to the size of the calculation.
Keywords: grid computing; data mining; distributed applications; knowledge discovery framework; web service resources.
DOI: 10.1504/IJWGS.2020.109474
International Journal of Web and Grid Services, 2020 Vol.16 No.3, pp.305 - 320
Received: 08 Feb 2020
Accepted: 27 Apr 2020
Published online: 09 Sep 2020 *