Asynchronous I/O in SCORPIO Boosts E3SM Throughput

  • May 28, 2026
  • Feature Story
  • Introduction

    I/O performance limits Earth system model simulations from producing high-frequency output. As model resolution increases or output is written more frequently, the volume of data written to the file system grows substantially, complicating data management. Compressing output can help reduce the data volume, but further slows down simulations.

    It would also be very helpful for Artificial Intelligence (AI) training if E3SM could write data in formats other than NetCDF. However, the I/O libraries supporting these emerging data formats may not scale effectively for high-resolution E3SM simulations.

    A potential approach to address these issues is to offload writing model output and related processing to a dedicated I/O thread, enabling asynchronous handling of the model output.

    Asynchronous I/O architecture

    The team has added asynchronous I/O support for model output using HDF5, writing data out in the NetCDF4 format, in the SCORPIO library. Since asynchronous I/O is handled within the SCORPIO library, no source code changes were required for the individual E3SM model components. E3SM users can enable asynchronous I/O for each model component using CIME while configuring and building the model. This feature will soon be extended to support other low-level I/O libraries.

    Overview of Asynchronous writes using SCORPIO.

    Figure 1. Overview of Asynchronous writes using SCORPIO.

    As shown in Figure 1 above, each model component writes out model output variables into multiple history and restart files. The data for each variable in the compute processes is aggregated and/or rearranged by the SCORPIO library to the I/O processes. Each I/O process launches a separate thread to process writing of this data to the file system. The parent I/O process queues each asynchronous operation (e.g. creating a file, writing a variable, etc.) into an asynchronous multi-threaded queue in the SCORPIO library. The I/O threads process these operations asynchronously while the compute processes continue with the model simulation steps.

    Asynchronously processing the model output essentially overlaps the writing of the model output with the model simulation, potentially improving the overall model throughput. Preliminary performance tests run on Perlmutter at NERSC show I/O speed-ups of 28% for simulations in what E3SM considers its low resolution (atmosphere at ~100km and ocean/sea ice components at 60km to 30km resolution) configuration.

    I/O Performance

    To measure the performance of asynchronous I/O, the researchers ran E3SM low resolution (as defined above) “watercycle” case for 4 simulated days, using 1024 MPI processes and 8 compute nodes on Perlmutter at NERSC. In order to stress the I/O system with only a few simulated days, the model history and restart outputs were written out daily which is much higher then normal. The output was written out by SCORPIO using the HDF5 I/O library and compressed using 16-bit fixed precision lossy data compression using the ZFP library. Each compute node had 8 I/O processes and each I/O process launched an asynchronous I/O thread to offload writing the model output.

    Asynchronous

    Figure 2. Asynchronous write performance on Perlmutter.

    Figure 2 above shows the E3SM model run and finalize times for the E3SM low resolution (ne30pg2) watercycle (E3SM compset : WCYCL_1850) case. The E3SM initialization time is not shown here since the initialization phase primarily reads input files and does not include any asynchronous I/O. In the “No I/O” case shown in the figure, all model output is disabled. The “Sync I/O” case is the default I/O mode where the model waits for I/O. The “Async I/O” case uses an asynchronous I/O thread to write all model output. As you can see in the figure, the model “finalize” time, the time spent calling the finalize subroutine, is greater for the “Async I/O” case compared to other cases, where its negligible, due to waits on pending asynchronous I/O operations. However the overall time taken for writing the model output (311 s) is less than the “Sync I/O” case (433 s), resulting in an overall speed up of 28%. In the “Async I/O” case the model outputs to history and restart files for the first three simulation days are written in parallel to the model simulation, improving the overall model throughput.

    Conclusion & Future Work

    The preliminary performance results are encouraging and suggest that asynchronous I/O is an effective approach for achieving portable, high-performance I/O. The team will continue to measure and optimize the asynchronous I/O implementation for HDF5, and plan to extend similar support to other low-level I/O libraries in the near future.

    At present, the asynchronous I/O framework only does part of the parallel I/O process asynchronously: the parallel write of the data. In the future the authors intend to also make the data transfer from compute to I/O processes asynchronous, which should further improve overall I/O performance.

    Contact

    • Jayesh Krishna, Argonne National Laboratory
    • Rob Jacob, Argonne National Laboratory
     
     

    This article is a part of the E3SM “Floating Points” Newsletter, to read the full Newsletter check:

     

    Send this to a friend