H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments

TitleH-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments
Publication TypeConference Proceedings
Year of Conference2018
AuthorsRoyo, A, Villamayor, J, Castro-León, M, del Rosario, DRexachs, Luque Fadón, E
Conference NameJornadas de Cloud Computing & Big Data (JCC&BD)
Pagination7-13
Date Published10/2018
PublisherFacultad de Informática
Conference LocationLa Plata, Argentina
ISBN978-950-34-1659-4
Abstract

Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the execution time was increased between a 5% to 36% without failures and 27% to 66% in case of failures.

URLhttp://sedici.unlp.edu.ar/handle/10915/69674
Campus d'excel·lència internacional U A B