EKea: E3SM Kernel Extraction and Analysis Framework

  • November 29, 2022
  • Home Page Feature,Releases
  • The E3SM Kernel Extraction and Analysis (EKea) framework introduces automation in the process of kernel extraction from E3SM, providing an important capability for E3SM developers. A kernel extracted using EKea (a.k.a. “EKea kernel”) is a standalone piece of software derived from a selected region of the larger E3SM codebase. A kernel can be compiled and executed independently, providing a simpler platform for experimentation without setting up the whole model and associated software dependencies. Using a kernel, one can focus on the desired code snippet of E3SM for various purposes including computational performance optimization, code debugging, unit testing, GPU porting, etc. EKea originated from the Fortran Kernel Generator (KGen) effort but underwent a substantial rewrite to incorporate a bottom-up modular design. While customized extensively for streamlined integration with E3SM build system and supported use-cases, this modular design also facilitates its use as a framework for developing additional Fortran analysis tools.

    Automated Kernel extraction

    While kernel driven development is quite beneficial for many software engineering tasks, it is quite tedious to generate a kernel manually. A straightforward copy/paste of a targeted region of code does not produce a standalone compilable/executable unit in itself. With manual kernel extraction, it is common to scan through all source files to find required statements to setup the relevant context such as variable declarations and dependencies from other modules. Furthermore, preparing state data for driving the execution of a generated kernel is much harder as it warrants runtime interception. To simplify this process, EKea automates most of the kernel extraction tasks through static code analysis and code generation.

    How EKea works

    EKea is a command-line tool running under Linux that drives the kernel extraction workflow using Python modules. The user interface is designed for E3SM model users and requires at least two arguments: 1) the desired code region for kernel extraction and 2) the path to an E3SM case directory. Within the case directory, EKea pulls additional information using the “xmlquery" script in the case directory. Figure 1. depicts the EKea workflow.

    EKea workflow diagram.

    Figure 1. EKea workflow diagram. EKea is invoked with information on the targeted code region for extraction and the path to an E3SM case directory.

    Once EKea gathers requisite information, the first step involves building the E3SM case to collect the compiler flags used for every source file. Then, it analyzes the E3SM sources and identifies the statements that should be extracted in a kernel. Next, it instruments the original E3SM code and runs it to generate timing for the specified region. Next, it generates the kernel source files based on the statements identified during the previous analysis step. Finally, it generates and saves the state data files to drive the standalone kernel execution.

    Illustrated use-cases

    Once a kernel is extracted, it is a self-contained software artifact representing a part of E3SM. As it can be compiled and executed without any external libraries or MPI, the user can conveniently experiment with the kernel for various purposes such as the use-cases described below.

    Computational performance optimization

    The initial goal when EKea was envisioned is to facilitate performance analysis and optimization activities of a targeted region of code. To maintain pace with active model development, an optimization task can be conducted concurrently and integrated back when it is ready. Furthermore, a large optimization task can be split into smaller sub-tasks so that several developers can work in parallel.

    Code debugging

    Debugging a parallel scientific application is a cumbersome activity at scale even with the right tools. To begin with, one needs to run E3SM under a debugger to identify the potential source of the problem or take an informed guess on where (specific line of code) to start the debugging. As a typical E3SM experiment runs on multiple processes using MPI, the user has to start or attach a HPC-ready debugger such as gdb4hpc during execution. Furthermore, on a HPC system with a job scheduler, there may be a considerable time spent in a scheduler queue before a debugging job can start. Hence, the EKea kernels are written as sequential software so that the code can be built and debugged even on a laptop computer, which makes it a lot easier to debug.

    Simulation comparison

    Sometimes, the same E3SM case can produce different simulation outputs on different systems. Figuring out the root cause of the difference is generally a time-consuming task. By extracting a kernel from a “suspicious” region of E3SM code, one can quickly compare the results between two systems. One success story of this type of task was when debugging the issue of using FMA (Fused-Multiply-Add) instructions available on different compute architectures. One system uses FMA while the other system does not. An extracted kernel was the key in the quick cycle of modify-run-verification during investigation of this issue.

    Unit testing

    Having a suite of unit-tests is widely recognized as a best practice for software development projects. Despite well-intentioned efforts, unit-testing has not seen adequate adoption in scientific software development, especially in the case of Fortran application development. Through enabling the easy generation of E3SM kernels, EKea can be used to put together a comprehensive test suite.

    EKea as an Fortran analysis framework

    Due to its modular design, EKea can be used as a framework for developing Fortran analysis tools. To demonstrate the EKea capability of the development framework, EKea distribution comes with two examples: 1) kernel timing generator and 2) kernel variable analyser. Please see EKea documentation for details (ekea: E3SM Kernel Extraction and Analysis – ekea 1.2.0 documentation). Additionally, we are investigating EKea extensions to study impact of mixed precision arithmetic in targeted kernels.

    Machine Learning

    EKea can potentially facilitate creation and experimentation of surrogate models of targeted regions. Specifically, the runtime interception capability can be used to aggregate state data from a larger E3SM simulation to drive training of a machine learning surrogate. This is an evolving capability and collaborations with interested users are solicited to target their workflow and address gaps.

    User feedback

    We welcome feedback as further use of extracted kernels is essentially up to the users’ imagination.

    While proven to be useful, EKea has certain limitations. Please check the documentation for known issues and submit any features requests or bug reports on our Github page. We welcome any additional feedback on improving the user experience as well.

    Resources

     

    Contact

    • Youngsung Kim, and Sarat Sreepathi, Oak Ridge National Laboratory

     

    Send this to a friend