Title: Don't lose the point, check it: Is your cloud application using the right strategy?
Authors: Demis Gomes; Glauco Gonçalves; Patricia Endo; Moisés Rodrigues; Judith Kelner; Djamel Sadok; Calin Curescu
Addresses: Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Networking and Research Telecommunications Group (GPRT), Universidade Federal de Pernambuco (UFPE), Recife, Pernambuco, Brazil ' Ericsson Research Group, Kista, Sweden
Abstract: Users pay for running their applications on cloud infrastructure, and in return they expect high availability, and minimal data loss in case of failure. From a cloud provider perspective, any hardware or software failure must be detected and recovered as quickly as possible to maintain users' trust and avoid financial losses. From a user's perspective, failures must be transparent and should not impact application performance. In order to recover a failed application, cloud providers must perform checkpoints, and periodically save application data, which can then be recovered following a failover. Currently, a checkpoint service can be implemented in many ways, each presenting different performance results. The main research question to be answered is: what is the best checkpoint strategy to use given some users' requirements? In this paper, we performed experiments with different checkpoint service strategies to understand how these are affected by the computing resources. We also provide a discussion about the relationship between service availability and the checkpoint service.
Keywords: checkpoint; failover; performance evaluation; SAF standard.
DOI: 10.1504/IJGUC.2019.102735
International Journal of Grid and Utility Computing, 2019 Vol.10 No.6, pp.681 - 693
Received: 21 Apr 2018
Accepted: 19 Nov 2018
Published online: 02 Oct 2019 *