CHEOPS Workshop at EuroSys 2024

The fourth Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems (CHEOPS'24)

Held in conjunction with EuroSys 2024 on April 22nd 2024, Athens, Greece

Agenda

The workshop will take place at the Abbey Hall of the Royal Olympic Hotel. The proceedings are available in the ACM Digital Library. One selected paper was also published in ACM SIGOPS Operating Systems Review (Volume 58, Issue 1).

Time	Content
8:30-9:00	Registration
9:00-9:10	Welcome - Shadi Ibrahim and Suren Byna
	Keynote Session - Chair: Suren Byna
9:10-10:00	*Keynote Speaker* - Ana Klimovic (ETH Zurich) Data Management for Cost-Efficient Machine Learning (More, Slides)
	Session 1 - Chair: Shadi Ibrahim
10:00-10:30	ZWAL: Rethinking Write-ahead Logs for ZNS SSDs with Zone Appends ( Slides) Krijn Doekemeijer (Vrije Universiteit Amsterdam), Zebin Ren (Vrije Universiteit Amsterdam), Nick Tehrany (BlueOne Business Software LLC.), and Animesh Trivedi (Vrije Universiteit Amsterdam)
10:30-11:00	Coffee break
	Session 2 - Chair: Jay Lofstead
11:00-11:30	TrackIops : Real-Time NFS Performance Metrics Extractor ( Slides) Théophile Dubuc (ENS de Lyon, Outscale), Pascale Vicat-Blanc (Inria, Ens de Lyon), Pierre Olivier (The University of Manchester), Mar Callau-Zori (Outscale), Christophe Hubert (Outscale), and Alain Tchana (Grenoble INP)
11:30-12:00	zns-tools: An eBPF-powered, Cross-Layer Storage Profiling Tool for NVMe ZNS SSDs ( Slides) Nick Tehrany (BlueOne Business Software LLC.), Krijn Doekemeijer (Vrije Universiteit Amsterdam), and Animesh Trivedi (Vrije Universiteit Amsterdam)
12:00-12:30	Hawkeyes: Addressing Weak Memory Order in Program Migration Based on Instruction Windows ( Slides) Zhangqi Zhu (Alibaba Group), Yuhui Cai (Xiamen University), Binbin Xu (Alibaba Group), Pingchao Yang (Alibaba Group), Jicheng Chen (Alibaba Group), and Zhirong Shen (Xiamen University)
12:30-13:45	Lunch
	Invited talk Session 1 - Chair: Kira Duwe
13:45-14:15	Jay Lofstead (Sandia National Laboratories, USA) HPC and databases revisited (More, Slides)
14:15-14:45	Michael Hennecke (Intel, Germany) DAOS - The Next Generation (More, Slides)
14:45-15:15	Shaohuai Shi (Harbin Institute of Technology, China) Scaling Hybrid Parallel Training with Efficient and Reliable In-memory Checkpointing (More, Slides)
15:15-15:30	[WiP] Malphas: Managing Legions of Regions (Slides) Ferdinand Gruber (Technical University of Munich), Lukas Vogel (Technical University of Munich), Jana Giceva (Technical University of Munich)
15:30-16:00	Coffee break
	Invited talk Session 2 - Chair: Suren Byna/Shadi Ibrahim
16:00-16:30	Animesh Trivedi (VU Amsterdam, The Netherlands) Exploring Storage Stack on Modern NVMe Hardware (More, Slides)
16:30-16:50	Discussion
16:50-17:00	Closing Remarks

Keynote Speaker

Ana Klimovic, ETH Zurich

Keynote: Data Management for Cost-Efficient Machine Learning

Abstract

Training AI models is increasingly resource-intensive, time-consuming, and expensive as model sizes and datasets continue to grow. This talk will explore how to make training ML models more cost-effective from a data management and data preprocessing perspective.
First, we will discuss how to detect and eliminate input data processing bottlenecks that can occur in ML training and cause expensive GPU/TPU resources to be heavily underutilized. I will present Cachew, a disaggregated data preprocessing system architecture build on top of TensorFlow’s tf.data, which automatically scales resources for data processing to avoid input data stalls. Cachew also maintains a global view of data processing across jobs, which enables selectively caching preprocessed datasets to maximize training throughput and improve energy efficiency across jobs.
Second, we will discuss how to make ML training more data-efficient in practical deployments where training data evolves (e.g., grows) over time and models frequently need to be retrained. Optimizing training costs for these applications requires optimizing how often a model is retrained and on how much data, while maintaining high accuracy. I will present Modyn, an open-source platform for ML training on dynamic datasets that enable researchers to explore a variety of retraining and data selection policies to minimize the cost required to reach a target model accuracy.

Bio

Ana Klimovic is an Assistant Professor in the Systems Group of the Computer Science Department at ETH Zurich. Her research interests span operating systems, cloud computing, and their intersection with machine learning. Ana’s work focuses on computer system design for large-scale applications such as cloud computing services, data analytics, and machine learning. Before joining ETH in August 2020, Ana was a Research Scientist at Google Brain and completed her Ph.D. in Electrical Engineering at Stanford University.

Invited Speakers

Jay Lofstead, Sandia National Laboratories

Invited Talk: HPC and databases revisited

Abstract

Around twenty-five years ago, the HPC storage and IO community investigated the potential for relational databases for HPC data management and found numerous issues making an RDBMS a poor choice. SciDB made decisions as well that further cemented the difficulty in terms of ingestion velocity and overhead against an RDBMS mode for HPC data management. More recent work in the metadata arena, such as EMPRESS, and in a new IO library, Stitch-IO, show a potential path towards bringing these communities together. However, the challenges present offer potentially new research and a need to overcome well entrenched bias. This talk will explore the roots of this bias, why we should rethink, and propose a path towards data management bliss.

Bio

Jay Lofstead is a Principal Member of Technical Staff at Sandia National Laboratories. His research interests focus around large scale data management and trusting scientific computing. In particular, he works on storage, IO, metadata, workflows, reproducibility, software engineering, machine learning, and operating system-level support for any of these topics. Broadly across these topics, he is also deeply interested in ethics related to these topics and computing in general and how to drive inclusivity across the computation-related science domains. Dr. Lofstead received his Ph.D. in Computer Science from the Georgia Institute of Technology in 2010.

Michael Hennecke, Intel

Invited Talk: DAOS - The Next Generation

Abstract

DAOS (https://daos.io/) is an open-source scale-out all-flash versioning object store that delivers extremely high performance to the most data-intensive HPC and AI workloads. It presents a rich and scalable storage interface that allows efficient storage of both structured and unstructured data. DAOS supports multiple application interfaces including a parallel filesystem, MPI-IO, HDF5, Hadoop/Spark connector, TensorFlow-IO, native Python dictionary bindings as well as domain-specific data models. While DAOS was originally based on Optane Persistent Memory, DAOS 2.4 now supports DAOS servers both with and without PMem. This presentation will highlight recent and upcoming technology developments in DAOS, and will explore how the DAOS Foundation (established in 2023 under the umbrella of the Linux Foundation) can facilitate further community contributions to DAOS.

Bio

Michael Hennecke is a Principal Engineer for HPC Storage in the Intel DAOS engineering team, and the chair of the DAOS Foundation’s Outreach Committee. Michael has 30 years of experience in High Performance Computing, including previous roles at Lenovo, IBM, and the Computing Center of the University of Karlsruhe. His research interests include HPC Storage and Performance Measurement & Modeling. Michael holds a Master of Science in Physics from the University of Bochum (Germany) and a Distinguished IT Specialist certification by The Open Group.

Shaohuai Shi, Harbin Institute of Technology

Invited Talk: Scaling Hybrid Parallel Training with Efficient and Reliable In-memory Checkpointing

Abstract

Hybrid parallelism is a standard approach to train large models (LM) on large-scale GPU clusters. It usually takes tens of thousands of GPUs for a prolonged duration, accompanied by frequent hardware and software failures. Against such failures, asynchronous checkpointing is proposed to snapshot parameters from the device to host memory and persist in remote storage, ideally without disrupting the training process. However, our investigation reveals that such state-of-the-art (SOTA) solutions impose GPU operation interference and cause large-scale GPUs to idle during parameter saving and loading, eventually increasing the carbon footprint and the financial cost of training. In this talk, I will introduce our recent work BurstCheck, a fault-tolerant distributed training framework with in-memory checkpoint design. It swiftly saves and loads checkpoints in local CPU memory with three optimizations: 1) Efficient in-memory checkpoint saving, 2) Reliable in-memory checkpoint protection, and 3) Accelerated failure recovery. We also propose a fast failure recovery scheme that enables training to swiftly and elastically restart from inter-node checkpoint gathering upon node failures, bypassing the inefficiencies of Network-File-System (NFS) read. On our 128 A800 GPU testbed, BurstCheck achieves SOTA parameter saving speed with minimum checkpointing overhead. It also guarantees the recovery of training from in-memory checkpoint in most failure circumstances.

Bio

Dr. Shaohuai Shi is currently a full professor with the School of Computer Science and Technology at Harbin Institute of Technology, Shenzhen. Prior to that, he was a Research Assistant Professor at The Hong Kong University of Science and Technology. He received his Ph.D. degree, master’s degree, and bachelor’s degree from Hong Kong Baptist University in 2020, Harbin Institute of Technology in 2013, and South China University of Technology in 2010, respectively. His current research interests focus on distributed machine learning systems. He has published over 30 peer-reviewed papers in top-tier venues such as IEEE TPDS, IEEE INFOCOM, ICLR, EuroSys, MLSys, etc. His research papers have received over 2000 citations and an H-index of 22 according to Google Scholar. He won the Best Paper Award in IEEE INFOCOM 2021 and IEEE DataCom 2018.

Animesh Trivedi, VU Amsterdam

Invited Talk: Exploring Storage Stack on Modern NVMe Hardware

Abstract

We are living in a data-centric society where every aspect of our daily life is heavily influenced by data-centric services. As the volume and speed of data generation increase, so does demand on how efficiently data can be stored and processed. Over the past decade, the Linux storage stack has undergone significant restructuring to meet the performance demands of modern workloads, yet much research around new hardware and software interfaces remain to be explored. In this talk, I will summarize the most recent research results from my group on (i) the performance characterization of Linux I/O stack APIs with its newest API, io_uring, on modern fast NVMe SSDs (Systor’22, CHEOPS’23); (ii) exploration of quality-of-service and scheduling challenges with NVMe and newly Zone Namespace (ZNS) interfaces (CLUSTER’23, ICPE’24). The emergence and standardization of ZNS storage hardware have led to the revaluation of responsibilities between storage hardware (SSDs) and storage software, which we are actively exploring by designing novel storage services. Code and reports from our group are available at: https://animeshtrivedi.github.io/team/ and https://github.com/stonet-research.

Bio

Animesh Trivedi is a tenured Assistant Professor in the Computer Science department at VU Amsterdam where he leads the storage research group. His research interests lie in building fast and efficient data storage systems using modern hardware. He is currently exploring how to leverage cheap and fast flash NVM storage devices to support the demands of data-heavy workloads. He has authored more than 50 research papers, and several international patents. His research and educational activities are supported by the Dutch Research Council (NWO), the EU (Cloudstars MSCA Staff Exchanges), and various industrial partners (Western Digital, Xilinx, AWS, Mellanox). For his research and educational work he has won multiple awards with his team (Best Paper Systor’17, Best Presentation CompSys’19, Best Bsc and MSc Amsterdam Thesis Awards 2020 2022, Best FAIR Artifact Award IEEE ACSOS 2021). In 2023, he was inducted to the Amsterdam Young Academy. He has served on the program committee of multiple flagship systems conferences and workshops, and is the Storage Track co-chair for the 24th IEEE/ACM CCGRID conference 2024. Prior to joining VU Amsterdam in 2019, he worked as a Research Staff Member at the IBM Research Lab in Zürich. He holds a PhD and Master from ETH Zürich. More about him can be found at https://animeshtrivedi.github.io.

Workshop Description

The CHEOPS workshop is aimed at researchers, developers of scientific applications, engineers and everyone interested in the evolution of storage systems. As the developments of computing power, storage and network technologies continue to diverge, the bandwidth performance gap between them widens. This trend, combined with the ever growing data volumes and data-driven computing such as machine learning, results in I/O and storage limitations, impacting the scalability and efficiency of current and future computing systems. Some of these challenges are quantitative, such as scale to match exascale system requirements, or latency reduction of the software stack to efficiently integrate new generations of hardware like storage class memory (SCM). Some other issues are more subtle and arise with the increased complexity of the storage solutions, like new smarter and more potent data management tools, monitoring systems or interoperability between I/O components or data formats.

The main objective of this workshop is to discuss state-of-the-art research, innovative ideas and experiences that focus on the design and implementation of storage systems in both academic and industrial worlds.

Important Dates

Abstract Submission: (Final Extension) February 8, 2024 (Anywhere on Earth)
Paper Submission: (Final Extension) February 8, 2024 (Anywhere on Earth)
Notification to Authors: March 1, 2024
Camera-Ready Deadline: March 11, 2024
Workshop Date: April 22, 2024

Submission Guidelines

In order to guarantee the quality of the submissions, we have formed a globally distributed, diverse program committee. All submissions will be reviewed by the program committee. We will use HotCRP to manage the submissions. The reviewing process will be double blind with at least 3 reviews for each submission. An online discussion will determine which papers to accept.

Only original and novel work not currently under review in other venues will be considered for publication. Submissions can either be full papers (6 pages) or short papers (4 pages). The page count includes the title, text, figures, appendices but excludes the references. They must be submitted electronically as PDF files formatted according to the submission rules of EuroSys. Accepted submissions will have to comply with the EuroSys proceedings format. One author of each accepted paper is required to register for the workshop and present the paper. Extended versions of selected papers will be considered for publication in the ACM SIGOPS Operating Systems Review journal.

Presentations can be given either in-person or via pre-recorded videos. In the latter case, the organizers will collect questions during the presentation and perform a live Q&A session with the presenter, who should be available via video conference.

Rights Forms

You will find a link to the ACM copyright form on your paper’s HotCRP page. Once completed, ACM will send out the information and LaTeX directives (DOI, ISBN etc.) needed to complete the camera-ready version of your paper.

Camera-Ready Format

You should use the acmart document class (https://www.acm.org/publications/proceedings-template, the same as for submission), as follows: \documentclass[sigplan,10pt]{acmart}

As mentioned above, you will receive the instruction regarding some LaTeX directives (\setcopyright, \acmConference, \acmDOI etc.) after completing the copyright form.

All accepted papers can use up to 2 additional pages for the camera-ready version, for a final limit of 8 (full papers) or 6 (short papers) pages, references not included.

Note that Type 1 fonts (scalable) should be used, not Type 3 (bitmapped), and that all fonts must be embedded. Type and embedding of fonts can be checked with various tools including pdffonts. Page numbers should be suppressed. Make also sure that the PDF is searchable by testing the search function in a PDF reader.

Uploading Final Versions

The camera-ready version of your paper and its LaTeX sources have to be uploaded via HotCRP. As a reminder, the camera-ready deadline for all papers is March 11th, 2024.

Topics of Interest

Submissions may be more hands-on than research papers and we therefore explicitly encourage submissions in the early stages of research. Topics of interest include, but are not limited to:

Operating system optimizations
Kernel and user space file/storage systems
- Including virtual file systems
Cloud, parallel and distributed file/storage systems
- Network challenges, such as scalability, QoS and partitionability
Approaches for low-latency and heterogeneous storage systems
- Such as SCM and NVRAM combined with HDDs
Metadata management
Machine Learning and Artificial Intelligence
- Storage requirements of ML and AI applications
- Using ML and AI within storage systems (e.g., to replace heuristics)
Hybrid solutions using file systems and databases
- Approaches using query and database interfaces, including key-value stores
- Optimized indexing techniques
Data organizations to support online workflows
Data privacy and data security
Domain-specific data management solutions
- Application I/O characterization
Storage systems modeling and analysis tools
Data reduction techniques
- Lossless and lossy compression, deduplication
UI/UX for storage systems
Related experiences from users: what worked, what didn’t?
- Feedback and empirical evaluation of storage systems

WORK IN PROGRESS (WIP) SESSION

There will be a WIP session where presenters provide brief (5-minute) talks on their on-going work, with fresh problems/solutions. WIP will not be included in the proceedings. A one-page abstract is required.

DEADLINES

Work in Progress (WIP) submissions due: March 15, 2024, 11:59 PM AoE
Notification: On or before March 15, 2024

Submissions by email: Please email your submission as a PDF attachment of the one-page abstract to Suren Byna (byna.1@osu.edu) and Amelie Chi Zhou (amelieczhou@hkbu.edu.hk). Put “Cheops 2024 WIP” as the first part of the message subject.

Organization

Steering Committee

Jean-Thomas Acquaviva - DDN, France
Jalil Boukhobza - National Institute of Advanced Technologies of Brittany (ENSTA Bretagne), France
Konstantinos Chasapis - DDN, France
Kira Duwe - École polytechnique fédérale de Lausanne (EPFL), Switzerland
Shadi Ibrahim - Inria, France
Michael Kuhn - Otto von Guericke University Magdeburg (OVGU), Germany

General Chair

Shadi Ibrahim - Inria, France

Program Chairs

Suren Byna - The Ohio State University, USA
Amelie Chi Zhou - Hong Kong Baptist University, Hong Kong

Program Committee

Jean Luca Bez - Lawrence Berkeley National Laboratory (LBNL), United States of America
Jerry Chou - National Tsing Hua University, Taiwan
Matthieu Dorier - Argonne National Laboratory, United States of America
Hao Fan - Huazhong University of Science and Technology, China
Anna Fuchs - Universität Hamburg, Germany
Christos Kozanitis - Foundation for Research and Technology Hellas (FORTH), Greece
Thomas Lambert - Université de Lorraine, France
Yusen Li - Nankai University, China
Jay Lofstead - Sandia National Laboratories (SNL), United States of America
Diana Moise - Hewlett Packard Enterprise (HPE), Switzerland
Anastasios Papagiannis - Isovalent, Greece
Vasily Tarasov - IBM Research, United States of America
Animesh Trivedi - Vrije Universite, Netherlands
Marc-André Vef - Johannes Gutenberg University Mainz, Germany