Title: Optimal configuration of fault-tolerance parameters for distributed replicated server access
Authors: Alessandro Daidone; Thibault Renier; Andrea Bondavalli; Hans Peter Schwefel
Addresses: ResilTech s.r.l., Piazza Iotti 25, 56025 Pontedera (PI), Italy ' ALTEN SA, 130-136, rue de Silly – 92773, Boulogne-Billancourt Cedex, France ' Dipartimento di Sistemi ed Informatica, Università degli studi di Firenze, Viale Morgagni 65, 50134 Firenze (FI), Italy ' Forschungszentrum Telekommunikation Wien, Donau-City-Str 1/3, 1220 Wien, Austria; Department of Electronic Systems, Networking and Security (NetSec) Section, Aalborg University, Niels Jernes Vej 12/A5-212, 9220 Aalborg-Øst, Denmark
Abstract: Server replication is a common fault-tolerance strategy to improve transaction dependability for services in communications networks. In distributed architectures, fault-diagnosis and recovery are implemented via the interaction of the server replicas with the clients and other entities such as enhanced name servers. Such architectures provide an increased number of redundancy configuration choices. The influence of a (wide area) network connection can be quite significant and induce trade-offs between dependability and user-perceived performance. This paper develops a quantitative stochastic model using stochastic activity networks (SAN) for the evaluation of performance and dependability metrics of a generic transaction-based service implemented on a distributed replication architecture. The composite SAN model can be easily adapted to a wide range of client-server applications deployed in replicated server architectures. In order to obtain insight into the system behaviour, a set of relevant environment parameters and controllable fault-tolerance parameters are chosen and the dependability/performance trade-off is evaluated.
Keywords: dependability; availability; model-based evaluation; stochastic activity networks; distributed architectures; fault tolerance; replicated server access; server replication; fault diagnosis; fault recovery; wide area networks; WANs; stochastic modelling.
DOI: 10.1504/IJCCBS.2013.056493
International Journal of Critical Computer-Based Systems, 2013 Vol.4 No.2, pp.144 - 172
Received: 13 Jul 2012
Accepted: 04 Jun 2013
Published online: 29 Apr 2014 *