Title: An overall approach to achieve load balancing for Hadoop Distributed File System
Authors: Chi-Yi Lin; Ying-Chen Lin
Addresses: Department of Computer Science and Information Engineering, Tamkang University, Taipei 25137, Taiwan ' Department of Computer Science and Information Engineering, Tamkang University, Taipei 25137, Taiwan
Abstract: Hadoop Distributed File System (HDFS) is a popular cloud storage system that can scale up easily to meet the increasing demand for more storage capacity. In HDFS, files are divided into fixed-size blocks, which are then replicated and randomly stored on many DataNodes to prevent data loss. It can be easily observed that the random nature of the default block placement strategy may lead to a load imbalance state among the DataNodes. Although HDFS has a built-in utility to achieve load balancing, it comes at the cost of a reduced system performance owing to moving blocks around. In this paper, we take a holistic approach to achieve load balancing by considering all situations that may influence the load-balancing state. We designed a new role named BalanceNode to help in matching heavy-loaded and light-loaded DataNodes, so those light-loaded nodes can share part of the load from heavy-loaded ones. We also designed a better block placement strategy to make the storage load as balanced as possible in the first place. The simulation results show that our approach can achieve better load-balancing state than with existing algorithms.
Keywords: cloud computing; Hadoop Distributed File System; load balancing.
DOI: 10.1504/IJWGS.2017.087370
International Journal of Web and Grid Services, 2017 Vol.13 No.4, pp.448 - 466
Received: 20 Oct 2016
Accepted: 15 Jul 2017
Published online: 13 Oct 2017 *