|Title||H-RADIC: The Fault Tolerance Framework for Virtual Clusters on Multi-Cloud Environments|
|Publication Type||Conference Proceedings|
|Year of Conference||2018|
|Authors||Royo, A, Villamayor, J, Castro-León, M, del Rosario, DRexachs, Luque Fadón, E|
|Conference Name||Jornadas de Cloud Computing & Big Data (JCC&BD)|
|Publisher||Facultad de Informática|
|Conference Location||La Plata, Argentina|
Even though the cloud platform promises to be reliable, several availability incidents prove that they are not. How can we be sure that a parallel application finishes the execution even if a site is affected by a failure? This paper presents H-RADIC, an approach based on RADIC architecture, that executes a parallel application in at least 3 different virtual clusters or sites. The execution state of each site is saved periodically in another site and it is recovered in case of failure. The paper details the configuration of the architecture and the experiments results using 3 virtual clusters running NAS parallel applications protected with DMTCP, a very well-known distributed multi-threaded checkpoint tool. Our experiments show that the execution time was increased between a 5% to 36% without failures and 27% to 66% in case of failures.