0
Title | A Fault Tolerance Manager with Distributed Coordinated Checkpoints for Automatic Recovery |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Villamayor, J, Rexachs, D, Luque, E |
Conference Name | 2017 International Conference on High Performance Computing Simulation (HPCS) |
Date Published | July |
Keywords | automatic recovery, Checkpoint/Restart, Computer architecture, Distributed Checkpointing, Fault tolerance, Fault tolerant systems, Libraries, Maintenance engineering, Observers, Protocols |
DOI | 10.1109/HPCS.2017.73 |