Issue 1 - 2021-01-29

Download the complete issue as PDF from Zenodo.

The Journal of High-Performance Storage (JHPS) is a new open-access journal edited by storage experts that unite key features of journals fostering openness and trust in storage research. In particular, JHPS offers open reviews, living papers, digital replicability, and free open access.

The editing team is proud to announce the publication of the first JHPS issue today representing an important milestone. The first issue contains just one publication, however, the difficult situation in 2020 has impaired the submission numbers. We use this chance to look back at the developments during this turbulent year. While the webpage has been officially started about one year ago in 2020, we knew that the processes and toolchains needed further development and testing. As it turned out, the year was even more challenging than we anticipated, not only for HPC and storage experts but for society as a whole facing the COVID-19 pandemic. For researchers, the pandemic impacted their general research focus, administrative tasks, and their productivity which impacted their publication behavior.

In 2020, JHPS managed to review the processes revolving around publication; we improved their quality and increased the capabilities of tools based on the feedback of authors and reviewers. Particularly, we thank the HPC-IODC workshop for the fruitful collaboration with JHPS to test the open review process on the submitted research papers for HPC-IODC. Initially, the Google Docs format was explored for the public review process as its suggestion mode is powerful and allows reviewers to effectively add comments and minor suggestions. However, it turned out, the text setting features provided by Google Docs does not meet our aspirations for high-quality camera-ready publications. Therefore, we developed a LaTeX template and a Google Docs plugin to allow annotations to LaTeX files hosted at GitHub. It turned out that this tooling yields high-productivity while it is inclusive for public reviewers. Additionally, we introduced the JHPS Manuscript Central, a lightweight web-based system that manages the relevant publication workflows for authors and reviewers.

Now that we are confident in the effectiveness of the established workflows and tools, our goal is to foster the adoption of the journal and to refine the workflows for digital replicability.

We thank all authors, reviewers, and readers.

Cordially,
Julian Kunkel, Jean-Thomas Acquaviva, Suren Byna, Adrian Jackson, Ivo Jimenez, Anthony Kougkas, Jay Lofstead, Glenn K. Lockwood, Carlos Maltzahn, George S. Markomanolis, Lingfang Zeng
JHPS Editors

Articles

Classifying Temporal Characteristics of Job I/O Using Machine Learning Techniques

Eugen Betke, Julian Kunkel
Keywords
  • IO fingerprinting
  • performance analysis
  • monitoring
Date: 2021-01-29
Version: 1.0
PDF DOI Workflow
BibTeX Provide feedback

BibTeX

@article{ JHPS-2021--1,
author = {Eugen Betke \and Julian Kunkel},
title = {{Classifying Temporal Characteristics of Job I/O Using Machine Learning Techniques}},
year = {2021},
month = {01},
journal = {Journal of High Performance Computing},
series = {Issue },
isbn = {},
doi = {10.5281/zenodo.4478960},
url = {\url{https://jhps.vi4io.org/issues/#-1}},
abstract = {{Every day, supercomputers execute 1000s of jobs with different characteristics. Data centers monitor the behavior of jobs to support the users and improve the infrastructure, for instance, by optimizing jobs or by determining guidelines for the next procurement. The classification of jobs into groups that express similar run-time behavior aids this analysis as it reduces the number of representative jobs to look into. This work utilizes machine learning techniques to cluster and classify parallel jobs based on the similarity in their temporal I/O behavior. Our contribution is the qualitative and quantitative evaluation of different I/O characterizations and similarity measurements and the development of a suitable clustering algorithm. <br><br> In the evaluation, we explore I/O characteristics from monitoring data of one million parallel jobs and cluster them into groups of similar jobs. Therefore, the time series of various I/O statistics is converted into features using different similarity metrics that customize the classification. <br><br> When using general-purpose clustering techniques, suboptimal results are obtained. Additionally, we extract phases of I/O activity from jobs. Finally, we simplify the grouping algorithm in favor of performance. We discuss the impact of these changes on the clustering quality.}}
}

Every day, supercomputers execute 1000s of jobs with different characteristics. Data centers monitor the behavior of jobs to support the users and improve the infrastructure, for instance, by optimizing jobs or by determining guidelines for the next procurement. The classification of jobs into groups that express similar run-time behavior aids this analysis as it reduces the number of representative jobs to look into. This work utilizes machine learning techniques to cluster and classify parallel jobs based on the similarity in their temporal I/O behavior. Our contribution is the qualitative and quantitative evaluation of different I/O characterizations and similarity measurements and the development of a suitable clustering algorithm.

In the evaluation, we explore I/O characteristics from monitoring data of one million parallel jobs and cluster them into groups of similar jobs. Therefore, the time series of various I/O statistics is converted into features using different similarity metrics that customize the classification.

When using general-purpose clustering techniques, suboptimal results are obtained. Additionally, we extract phases of I/O activity from jobs. Finally, we simplify the grouping algorithm in favor of performance. We discuss the impact of these changes on the clustering quality.