Skip to the content.

The Fifth Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems (CHEOPS'25)

Held in conjunction with ASPLOS & EuroSys 2025 on March 31st 2025, Rotterdam, Netherlands

The workshop will take place in Leeuwen Room I at the Postillion Hotel & Convention Centre WTC Rotterdam, Beursplein 37, 3011 AA Rotterdam, The Netherlands.

Agenda

Time Speaker / Authors (Affiliation) Content
8:30-9:00   Coffee and Registration
9:00-9:10 Suren Byna CHEOPS Welcome
9:10-10:00 Keynote - Gustavo Alonso (ETH Zurich, Switzerland) Vertically integrated storage systems
10:00-10:30 Louis-Marie NICOLAS (Lab-STICC, CNRS UMR 6285, ENSTA, Institut Polytechnique de Paris), Salim MIMOUNI (Atos BDS R&D Data Management), Philippe COUVEE (Atos BDS R&D Data Management), Jalil BOUKHOBZA (Lab-STICC, CNRS UMR 6285, ENSTA, Institut Polytechnique de Paris) Characterizing the Use of DVFS for HPC I/O Optimization: A Microbenchmarking Approach
10:30-11:00   Coffee break
11:00-11:30 Robin Vonk (Delft University of Technology), Joost Hoozemans (Voltron Data), Zaid Al-Ars (Delft University of Technology) GSST: Parallel string decompression at 191 GB/s on GPU
11:30-12:00 Shadi Ibrahim (Inria, Rennes), Jad Darrous (Inria, Rennes) Erasure Coding Aware Block Placement for Data-Intensive Applications
12:00-12:30 Invited talk - Jean-Thomas Acquaviva (DDN Storage) From HPC to AI: A Data Journey
12:30-14:00   Lunch break
14:00-14:30 Zebin Ren (Vrije Universiteit Amsterdam), Krijn Doekemeijer (Vrije Universiteit Amsterdam), Tiziano De Matteis (Vrije Universiteit Amsterdam), Christian Pinto (IBM Research Europe), Radu Stoica (IBM Research Europe), Animesh Trivedi (IBM Research Europe) An I/O Characterizing Study of Offloading LLM Models and KV Caches to NVMe SSD
14:30-15:00 Invited talk - Yang Zheng (Huawei Technologies) Reliability challenges and opportunities for AI infra: from industry perspective
15:00-15:30 Invited talk - Shadi Ibrahim (Inria, Rennes) TBD
15:30-16:00   Coffee break
16:00-16:15 Joost Hoozemans (Voltron Data, Delft University of Technology), Robin Vonk (Delft University of Technology), Johan Peltenburg (Voltron Data), Felipe Aramburu (Voltron Data), Zaid Al-Ars (Delft University of Technology) Using GPU Direct Storage with High-Performance Distributed Filesystems
16:15-16:30 Pınar Tözün (IT University of Copenhagen), Karl B. Torp (Samsung Denmark Research Center), Simon A. F. Lund (Samsung Denmark Research Center) A Quest to Reduce Dependency on CPUs in Deep Learning Data Pipelines
16:30-16:50 All participants Discussion
16:50-17:00 CHEOPS Organizers Closing remarks
18:00-19:30   Welcome Reception

Keynote Speaker

Professor Gustavo Alonso

Keynote: Vertically integrated storage systems

Abstract

Storage systems are often seen as being separated from compute systems. In this talk, I will argue that storage should be vertically integrated into compute units, i.e., storage should be an integral and seamless component actively participating in the processing of data. The motivation to do so is obvious from the sheer amount of data that needs to be processed these days, not only on ML/LLM/AI applications but also in more conventional data analytics. And the concept of vertically integrating storage applies whether it is a local disk or disaggregated storage in the cloud. In fact, the performance characteristics of cloud storage has already led to pushing some amount of data processing down to the storage layer to minimize the amount of data to be transferred across the network and to compute nodes. In the talk, I will argue that these initial steps should be expanded to create a computational pipeline from storage to the compute node memory that includes active storage, smart NICs, and accelerators. While some of these ideas have been pursued in isolation and are often centered around a particular type of technology, today we have the opportunity to think about such designs end-to-end taking advantage of innovations such as CXL. In the talk I will motivate the idea, suggest ways to implement initial prototypes, and discuss its integration into software systems – often the biggest bottleneck when trying to take advantage of hardware advances.

Bio

Gustavo Alonso is a professor in the Department of Computer Science of ETH Zurich where he is a member of the Systems Group (www.systems.ethz.ch) and the head of the Institute of Computing Platforms. He leads the AMD HACC (Heterogeneous Accelerated Compute Cluster) deployment at ETH (https://github.com/fpgasystems/hacc), with several hundred users worldwide, a research facility that supports exploring data center hardware-software co-design. His research interests include data management, cloud computing architecture, and building systems on modern hardware. Gustavo holds degrees in telecommunication from the Madrid Technical University and a MS and PhD in Computer Science from UC Santa Barbara. Previous to joining ETH, he was a research scientist at IBM Almaden in San Jose, California. Gustavo has received 4 Test-of-Time Awards for his research in databases, software runtimes, middleware, and mobile computing. He is an ACM Fellow, an IEEE Fellow, a Distinguished Alumnus of the Department of Computer Science of UC Santa Barbara, and has received the Lifetime Achievements Award from the European Chapter of ACM SIGOPS (EuroSys).

Invited Speakers

Dr. Jean-Thomas Acquaviva, DDN Storage

Invited Talk: From HPC to AI: a data Journey

Abstract

Two technological revolutions have recently marked the storage community: the mass availability of Flash and the rise of object-type APIs. These revolutions seem to be coming to an end, and a common architecture is emerging. Data centers tend to become more and more “data centric” with different computing services attached to a central data space. Due of this centrality, the datahub must meet five main criteria: extreme scalability, uncompromising performance, operational efficiency, 24/7 availability and ease of allocation of shared resources. To which point this architecture differ from AI and HPC? In this talk, we will discuss from an industrial standpoints the main area of convergence and differenciation depending on the dominant workload.

Bio

Jean-Thomas successively worked for Intel, the University of Versailles and the French Atomic Commission (CEA). He participated to the creation of their joint laboratory on Exascale Research. At DDN, Jean-Thomas’ role includes overseeing research collaborations in Europe as well as product management for some advanced DDN’s solutions.

Dr. Shadi Ibrahim, Inria - Rennes

Invited Talk: TBD

Abstract
Bio

Dr. Yang Zheng, Huawei

Invited Talk: Reliability challenges and opportunities for AI infra: from industry perspective

Abstract

AI clusters are emerging as a critical infrastructure and technological frontier. As models grow in size following scaling laws, ensuring stable and reliable operation of large-scale model tasks on massive AI clusters has become a significant challenge in the industry.

Training and inference tasks for large models are highly coupled and low-fault-tolerant systems. Distributed training involves frequent communication between nodes, strong dependencies across parallel domains, and requirements for proper computational accuracy. These factors lead to frequent training interruptions due to hardware failures, slow recovery, and fail-slow. Additionally, silent data corruptions can result in model non-convergence. As the scale of training expands, reliability becomes a major bottleneck.

The key challenge is to build a highly available AI system architecture capable of supporting scenarios such as training on clusters with hundreds of thousands of cards, inference on super-nodes with hundreds or thousands of cards, and integrated training-inference tasks. Achieving “zero” perception of fault recovery in business operations is essential for ensuring the reliability of large model infrastructures. Addressing these challenges will be critical for advancing the scalability and robustness of AI systems in the future.

Bio

Dr. Yang Zheng is a principle Engineer of Reliability Technology Lab of Huawei Technologies Co., Ltd.. Dr Zheng is also currently a member of reliable AI infra project, focus on research on AI Infra testing, monitoring and recovery. Dr Yang Zheng received his PhD degree from Imperial College London in UK. Research interest includes elastic training/inference, silent data corruption.

Workshop Description

The fifth workshop on “Challenges and Opportunities of Efficient and Performant Storage Systems” (CHEOPS) is aimed at researchers, developers of scientific applications, engineers and everyone interested in the evolution of storage systems. As the developments of computing power, storage and network technologies continue to diverge, the bandwidth performance gap between them widens. This trend, combined with the ever-growing data volumes and data-driven computing such as machine learning, results in I/O and storage limitations, impacting the scalability and efficiency of current and future computing systems. Some of these challenges are quantitative, such as scale to match exascale system requirements, or latency reduction of the software stack to efficiently integrate new generations of hardware like storage class memory (SCM). Some other issues are more subtle and arise with the increased complexity of the storage solutions, like new smarter and more potent data management tools, monitoring systems or interoperability between I/O components or data formats.

The main objective of this workshop is to discuss state-of-the-art research, innovative ideas and experiences that focus on the design and implementation of storage systems in both academic and industrial worlds.

Important Dates

Submission Guidelines

In order to guarantee the quality of the submissions, we have formed a globally distributed, diverse program committee. All submissions will be reviewed by the program committee. We will use HotCRP to manage the submissions. The reviewing process will be double blind with at least 3 reviews for each submission. An online discussion will determine which papers to accept.

Only original and novel work not currently under review in other venues will be considered for publication. Submissions can either be full papers (6 pages) or short papers (4 pages). The page count includes the title, text, figures, appendices but excludes the references. They must be submitted electronically as PDF files formatted according to the submission rules of EuroSys. Accepted submissions will have to comply with the EuroSys proceedings format. One author of each accepted paper is required to register for the workshop and present the paper. Extended versions of selected papers will be considered for publication in the ACM SIGOPS Operating Systems Review journal.

Camera-Ready Format

You should use the acmart document class (https://www.acm.org/publications/proceedings-template, the same as for submission), as follows: \documentclass[sigplan,10pt]{acmart}

As mentioned above, you will receive the instruction regarding some LaTeX directives (\setcopyright, \acmConference, \acmDOI etc.) after completing the copyright form.

All accepted papers can use up to 2 additional pages for the camera-ready version, for a final limit of 8 (full papers) or 6 (short papers) pages, references not included.

Note that Type 1 fonts (scalable) should be used, not Type 3 (bitmapped), and that all fonts must be embedded. Type and embedding of fonts can be checked with various tools including pdffonts. Page numbers should be suppressed. Make also sure that the PDF is searchable by testing the search function in a PDF reader.

Information regarding the Rights forms and Uploading Final versions is coming soon.

Topics of Interest

Submissions may be more hands-on than research papers and we therefore explicitly encourage submissions in the early stages of research. Topics of interest include, but are not limited to:

WORK IN PROGRESS (WIP) SESSION

There will be a WIP session where presenters provide brief (5-minute) talks on their on-going work, with fresh problems/solutions. WIP will not be included in the proceedings. A one-page abstract is required.

DEADLINES

Submissions by email: Please email your submission as a PDF attachment of the one-page abstract to Amelie Chi Zhou (amelieczhou@hkbu.edu.hk) and Kira Duwe (kira.duwe@epfl.ch). Put “CHEOPS 2025 WIP” as the first part of the message subject.

Organization

Steering Committee

General Chair

Program Chairs

Program Committee