Portable Software Environment for Ultrahigh-resolution E3SM Land Model Development on GPUs

  • May 15, 2025
  • Feature Story
  • Background

    Earth System Science relies on complex models to understand the planet’s processes, often requiring supercomputers with specialized architectures like GPUs. However, running these models consistently across different systems can be challenging due to variations in hardware and software setups.

    Docker is an innovative platform designed to simplify software development and deployment through containerization, which packages applications with their dependencies into standardized units. Docker ensures that programs run consistently across various computing environments — be it a personal laptop, a cloud server, or a high-performance supercomputer. This capability is particularly important for researchers, as it facilitates the sharing of complex workflows and experimental setups, streamlines collaboration, and minimizes compatibility issues.

    Summary

    Researchers have developed a portable Docker environment to support the ultra-high-resolution E3SM Land Model (uELM), targeting exascale computers with Nvidia GPUs [1]. This new environment aims to facilitate community-based uELM development and testing across CPU and GPU platforms by providing preconfigured datasets, simulation cases, source code, unit testing, and model testbeds. Key technical features of the environment include GPU-ready container generation, code management, and input data distribution.

    Introduction

    Earth system models (ESMs), such as the Energy Exascale Earth System Model (E3SM), are crucial tools for comprehending the dynamics of the Earth system and the intricate relationships between human activities and the planet. The E3SM Land Model (ELM), a component of E3SM, simulates land surface processes and interactions with other Earth system components. ELM is a complex codebase with significant dependencies on high-performance computing (HPC) environments.  Recent advancements have focused on developing a large-scale, ultra-high-resolution ELM (uELM) for exascale computers with GPUs [2,3]. However, uELM development is challenging due to its HPC requirements and the need for specialized machine and software configurations.

    To address this, researchers have developed a portable software environment that simplifies uELM development and testing on hybrid CPU/GPU architectures. This environment provides preconfigured components and tools, streamlining the process and promoting community-based collaboration.

    Figure 1. Portable Software Environment and its deployment in computational resources

    Portable Software Environment for uELM development

    The purpose of the portable software environment for the ultra-high-resolution E3SM Land Model (uELM) development is to significantly streamline the process of uELM development and testing on hybrid CPU/GPU architectures. This environment utilizes OS-level virtualization (Docker technologies) to encapsulate the code, libraries, and system software required for uELM development, specifically targeting NVIDIA GPUs.

    The creation of GPU-ready uELM containers involves a series of steps: identifying a Docker image with Nvidia GPU support, customizing it for ELM simulation, integrating a Functional Unit Test (FUT) tool for standalone uELM module creation, deploying the Software Package for ELM Development (SPEL) [4,5] for GPU-ready uELM code porting and optimization, and conducting end-to-end uELM code development on GPUs. SPEL, a software tool, facilitates uELM code generation and optimization over GPUs using compiler directives within a FUT framework. It also enables users to develop their own strategies for GPU code porting, resource utilization, and performance maximization.

    Standard uELM containers, equipped with E3SM source code, input data, and tools like Offline Land Model Testbed (OLMT) and SPEL, are designed for uELM development on CPU-only or hybrid CPU/GPU systems. These containers simplify uELM simulation case creation, code development, and data sharing across computational platforms.

    Code management and data sharing are efficiently handled within the uELM environment. The E3SM project utilizes GitHub as its version control system, and users can map a local Git folder to the uELM container. For data sharing, users can mount local data directories or utilize Network File System (NFS) if resources are on the same network.

    The uELM software environment offers a versatile solution for uELM development and testing, promoting collaboration and accelerating advancements in Earth system modeling (Fig. 1).

    Case study and demonstration

    The uELM software environment is currently deployed in federated computational resources at ORNL, including laptops, PCs, an NVIDIA DGX-station, and a virtual machine in the ORNL open cloud. These resources are used for uELM development and testing.

    A standard container contains an exemplary case that uses observational forcing data to drive uELM simulations at 42 FluxNet sites globally. Specifically, the container includes input datasets and batch scripts for configuration, construction, and automatic launch of simulations. A data duplication function allows users to replicate datasets for larger simulations to test parallel computing performance and scalability. The ‘OLMT_docker_42fluxnetsites_example.sh’ script creates three simulation cases: ad-spinup, transit run, and future projection. These simulations serve as reference solutions for SPEL FUT test and end-to-end uELM development on GPUs (Fig. 2).

    Location of 42 FLuxNet Sites

    The SPEL toolkit is integrated into the standard uELM development container. It generates standalone functional unit testing programs for individual uELM modules, and includes read and write codes for IO, a verification code to compare outputs, and a FUT driver.

    The standard container is also used for end-to-end uELM development with user development branches. The process involves checking out the latest uELM development branch into a local repository and mounting it into the container.

    Finally, the uELM software environment serves as a collaborative platform for interactive code development. A virtual machine on the ORNL OpenCloud with a static IP address and unified user ID allows authorized users to access and co-develop the uELM simultaneously from their own computers, which is beneficial for geographically distributed development.

    Future plan: The current focus is on NVIDIA GPUs and OpenACC. Future plans include incorporating support for AMD GPUs and OpenMP, as well as establishing a multi-user uELM development environment on HPC clusters using Apptainer.

    References

    1. Wang, D., Schwartz, P., Yuan, F., Riccuito, D., Thornton, P., Layton, C., & Eagerbarge, F (2025), Portable Software Environment for Ultrahigh-resolution E3SM Land Model Development on GPUs (Journal of Computing and Communication, 10.4236/jcc.2025.132003)
    2. D. Wang, P. Schwartz, F. Yuan, P. Thornton, and W. Zheng, “Towards ultrahigh-resolution e3sm land modeling on exascale computers,” Computing in Science & Engineering, no. 01, pp. 1–14, 2022.
    3. Yuan, D. Wang, S.-C. Kao, M. Thornton, D. Ricciuto, V. Salmon, C. Iversen, P. Schwartz, and P. Thornton, “An ultrahigh-resolution e3sm land model simulation framework and its first application to the seward peninsula in alaska,” Journal of Computational Science, vol. 73, p. 102145, 2023.
    4. P. Schwartz, D. Wang, F. Yuan, and P. Thornton, “Spel: Software tool for porting e3sm land model with openacc in a function unit test framework,” in Accelerator Programming–WACCPD 2022: 9th Workshop on Accelerator Programming Using Directives, Dallas, USA, Nov 18, 2022, Proceedings. Springer, 2022, pp. 1–14.
    5. P. Schwartz, D. Wang, F. Yuan, and P. Thornton, “Developing an elm ecosystem dynamics model on gpu with openacc,” in Computational Science–ICCS 2022: 22nd International Conference, London, UK, June 21–23, 2022, Proceedings, Part II. Springer, 2022, pp. 291–303.
    Send this to a friend