Title: An approach to fault tolerance in the cloud using the checkpointing technique
Authors: Ghalem Belalem; Said Limam
Addresses: Department of Computer Science, Faculty of Sciences, University of Oran, B.P. 1524, EL M'Naouer, Oran, 31000, Algeria ' Department of Computer Science, Faculty of Sciences, University of Oran, B.P. 1524, EL M'Naouer, Oran, 31000, Algeria
Abstract: Reliability refers to the probability that a system will offer failure-free service for a specified period of time within the bounds of a specified environment. For the cloud, reliability is broadly a function of the reliability of four individual components: 1) the hardware and software facilities offered by providers; 2) the provider's personnel; 3) connectivity to the subscribed services; 4) the subscriber's personnel. It is too expensive to provide redundant alternative components for all the cloud components. To reduce the cost and to develop highly reliable cloud within the limited budget, we proposed in this paper a fault tolerant architecture to cloud computing that uses a dynamic and adaptive checkpoint mechanism to provide a reliable cloud computing system.
Keywords: fault tolerance; cloud computing; virtualisation; checkpointing; resource management; reliability.
DOI: 10.1504/IJCNDS.2013.056221
International Journal of Communication Networks and Distributed Systems, 2013 Vol.11 No.3, pp.236 - 249
Published online: 28 Feb 2014 *
Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article