A Fault Tolerance Manager with Distributed Coordinated Checkpoints for Automatic Recovery

TitleA Fault Tolerance Manager with Distributed Coordinated Checkpoints for Automatic Recovery
Publication TypeConference Paper
Year of Publication2017
AuthorsVillamayor, J, Rexachs, D, Luque, E
Conference Name2017 International Conference on High Performance Computing Simulation (HPCS)
Date PublishedJuly
Keywordsautomatic recovery, Checkpoint/Restart, Computer architecture, Distributed Checkpointing, Fault tolerance, Fault tolerant systems, Libraries, Maintenance engineering, Observers, Protocols
DOI10.1109/HPCS.2017.73
Campus d'excel·lència internacional U A B