NERSC Hackathon on profiling on NVIDIA
In collaboration with the NERSC Science Acceleration Program (NESAP), EAMxx group members Noel Keen, Luca Bertagna, and Naser Mahfouz participated in the Open Hackathon Series held virtually at LBNL over August 2024. The group worked closely with Akshay Subramanian from NVIDIA. The group set out to profile the EAMxx model from an application level covering the entire atmosphere–land runtime to a subprocess level in the physics parameterization implementations and the dynamics solver. During the hackathon, the group managed to speed up the EAMxx model by approximately 10 % at NERSC’s Perlmutter GPU system.
In order to facilitate the profiling of the EAMxx codebase using available tools like Nsight Systems and Nsight Compute, it was necessary to prepare the code with labels for easier interpretation and with build- and run-time configuration options. After implementing the necessary infrastructure changes, the team was able to profile the EAMxx run-time enabling the visualization of bottlenecks and bugs in the code (Fig. 1). Through profiling, the team identified existing bugs in the code and passed the information to the wider development team. They identified potentially unnecessary run-time debug and synchronization activities that lower the performance.
The group presented their findings to the entire EAMxx team after the hackathon. The hackathon success was a testament to a balanced team and flexible strategy leading to substantial improvements. Continuation of efforts is needed to debug the code, audit synchronization strategies, and explore further enhancements for sustained performance improvements down the line.
References
- NERSC GPU Hackathon, August 24
- NERSC Open Hackathons: Upcoming Events
- Technical Resources
- NERSC Training
This article is a part of the E3SM “Floating Points” Newsletter, to read the full Newsletter check: