Analyzing the I/O Patterns of Deep Learning Applications

TitleAnalyzing the I/O Patterns of Deep Learning Applications
Publication TypeConference Paper
Year of Publication2021
AuthorsPárraga, E, León, B, Bond, R, Encinas, D, Bezerra, A, Mendez, S, Rexachs, D, Luque, E
EditorNaiouf, Marcelo, Rucci, Enzo, Chichizola, Franco, De Giusti, L
Conference NameCloud Computing, Big Data & Emerging Topics
Date Published08/2021
PublisherSpringer International Publishing
Conference LocationArgentina
ISBN Number978-3-030-84825-5
KeywordsDeep learning, Distributed DL, I/O HPC, I/O Patterns

A traditional HPC storage system is designed to manage an I/O workload dominated by write operation bursts, mainly for applications carrying out simulations and checkpointing partial results. Currently, this context is more diverse because of artificial intelligence applications' workload, such as machine learning and deep learning. As ML/DL applications are becoming more compute-intensive, they require the power of HPC systems. However, the HPC I/O system could be a bottleneck to scaling these kind of applications, mainly in the training stage. In this paper, we present a methodology for analyzing the I/O patterns of deep learning applications that allows us to understand the DL applications' I/O in HPC systems. We have applied our approach to serial and distributed DL codes by using the TensorFlow2 and PyTorch framework for the MNIST and CIFAR-10 datasets.

Campus d'excel·lència internacional U A B