From Weeks to Hours: GPUs Supercharge DP-SCREAMv1
The E3SM team has introduced DP-SCREAMv1, a doubly periodic (hence DP-), process-level configuration that brings the Graphics Processing Unit (GPU)-accelerated C++/Kokkos engine of SCREAMv1 into compact domains for rapid experimentation. The configuration preserves methodological continuity with the global model, retaining the PG2 physics grid, conservative semi-Lagrangian tracer transport, and modern process sequencing, so results remain directly comparable while computational costs are dramatically reduced. This level of consistency was not achievable in the prototype Fortran version, DP-SCREAMv0.
Performance gains are substantial. Across standard 3.25 km-resolution domains, DP-SCREAMv1 achieves 3–8× speedups on GPUs relative to CPU runs. In practical workflows, this transforms formerly long-running simulations into routine exercises: a 200-day RCEMIP channel experiment that once occupied CPU nodes for most of a week now completes in under half a day on Perlmutter GPUs, eliminating the need for restarts and minimizing queue delays. For high-resolution simulations, previously infeasible in deep convective regimes with DP-SCREAMv0, “Giga-LES”-style runs with roughly one billion grid points at 100 m resolution now finish in under two hours per simulated day on 512 Perlmutter GPU nodes, enabling multiple realizations or long-duration

Figure 1. Snapshots of pseudo-optical depth from a 100-day DPv1 rotating radiative convective equilibrium (RCE) simulations. The configuration employs a 3000 km × 3000 km domain with a resolution of 3.25 km, prescribed uniform sea surface temperatures (SSTs) of 305 K, and a Coriolis parameter of 4.99 × 10^-5, corresponding to the value at 20 N.
These computational advances are already expanding the frontier of process-level experimentation. Using DP-SCREAMv1, rotating radiative–convective equilibrium (RCE) simulations on a 3,000 × 3,000 km domain can be completed in just a few hours, allowing detailed study of tropical-cyclone formation and lifecycle within an idealized framework (Fig. 1). Similarly, high-resolution RCEMIP experiments now permit investigation of model resolution sensitivity and ECS. In a proof-of-concept 200 m-resolution run (Fig. 2), DP-SCREAM exhibited a clear reduction in the characteristic “popcorn” convection seen at coarser grid spacing, with convection aggregating into larger, more coherent systems. These examples illustrate how GPU acceleration not only shortens turnaround time but also broadens the scientific scope of experiments, making multi-month, high-resolution studies of organized convection and tropical dynamics both practical and routine.
DP-SCREAMv1 is distributed with a growing case library, more than 30 modern and idealized setups, as part of the E3SM Single-Column/DP process-level ecosystem, lowering barriers to adoption and reproducibility. Documentation, setup guidance, case scripts, and diagnostics are available on the scmlib wiki: https://github.com/E3SM-Project/scmlib/wiki. Together, GPU acceleration, methodological consistency, and a curated case library position DP-SCREAMv1 as a powerful and practical engine for high-resolution, process-level discovery and model development within E3SM.
Paper reference:
- Bogenschutz, P. A., Clevenger, T, C., Bradley, A. M., Caldwell, P. M., Beydoun, H., Mahfouz, N., Keen, N. D., Guba, O., Bertagna, L., Foucar, J. G., Zhang, J., and Donahue, A.. High Performance, High Fidelity: A GPU-Accelerated Doubly-Periodic Configuration of the Simple Cloud-Resolving E3SM Atmosphere Model Version 1 (DP-SCREAMv1). J. Adv. Model Earth Syst, doi:10.1029/2025MS005127.
More from Floating Points:
- Resources for Getting to Know EAMxx/SCREAM
- EAMxx Decadal Run
- E3SM SCREAM Team Receives LLNL Deputy Director’s Science and Technology Excellence in Publication Award
- https://e3sm.org/overview-paper-for-the-doubly-periodic-scream-configuration-dp-scream/ (DP-SCREAMv0)
- https://e3sm.org/new-physgrid-and-dycore-methods-speed-up-eam-by-2x/ (PG2)
