E3SM New Feature Software Policy

  • May 24, 2022
  • Feature Story,Home Page Feature
  • Background

    During the 2021 leadership meeting, while discussing the delays in releasing version 2 of the E3SM code, it became apparent that the project needed to revisit the code integration policy and its implementation. The overall reasons motivating this decision were: (1) the expectation that there may be new features coming into E3SM from other BER funded projects and (2) some of the delay in the V2 Water Cycle Group tuning effort was due to insufficient testing or feature evalution during the integration process, and (3) the original 2015 code review policy while quite thorough, it assigned all the validation work to the now defunct Coupled Model Group.

    The project decided to review the procedures for incorporating new features into the model, including documentation, evaluation and code review. Mark Taylor chaired a “deep dive“, with members: Gautam Bisht, Andrew Bradley, Peter Caldwell, Katherine Calvin, Chris Golaz, Oksana Guba, Ben Hillman, Matt Hoffman, Robert Jacob, Philip Jones, Andrew Roberts, Luke Van Roekel, Jinyun Tang, Jon
    Wolfe and Walter Hannah.

    The New Policy

    The outcome of this review is a new policy and procedure that were established for code review and the incorporation of new features into the E3SM code-base.

    The full policy and the process is described in Code Review and New Feature Process

     

    Please note that all E3SM team members and ESMD supported model development project PIs should familiarize themselves with this new detailed code review process so that the new developments can be tested early on to be ready for possible integration into E3SM.

    The Deep-Dive Process and Changes

    The process of arriving at this policy and the major changes to the code review process are described below.

    The “deep dive” first reviewed several aspects of the V2 development and tuning process. The team looked at 10 case studies, covering the process of integrating several new features (new parameterizations in the atmosphere, new snow and sea ice physics, new model configurations) as well as several issues around mass and energy budgets. They documented what went well and what led to unexpected issues and delays in the V2 process. This analysis led to several revisions to E3SM’s code review and new feature integration policy. The main emphasis was on evaluating new features before they make it into the model, with a secondary emphasis on improved testing of important aspects of the model that are easy to inadvertently break. The “deep dive” also took the opportunity to streamline the original process for better integration into E3SM’s Github and Confluence practices.

    The team coined the phrase “stealth feature”. When the new feature is introduced to the code it can cause the output to either be the same as before, we call it bit-for-bit or BFB, or it can cause changes that can be characterized either as ‘roundoff’ (not changing the answer with statistical significance) or can be climate changing. Often thought the new feature is incorporated into the code turned off, and not being used, which will effectively means that the output will be BFB. When changes are BFB, the code is allowed to be integrated to the code-base without further testing. This describes a common and useful development strategy, but it was also found to cause failure mode that led to delays in the V2 release. A stealth feature refers to such a feature that is integrated into the model turned off by default, but with an option to turn it on at a later time. It is a useful way to make the feature available for further testing, while remaining hidden from most developers and end users. Because stealth features are turned off by default, they could be integrated into the E3SM code-base even if they were far from ready for fully coupled simulations.

    The biggest changes in E3SM’s new policy is the review for these stealth features. This includes a streamlined design document to enable review of the feature before integration is even considered,
    and more rigorous testing including the ability to run reasonably well in the coupled system. The team recognizes that defining “reasonably well” is difficult, since desirable improvements in one subcomponent
    can reveal previously compensating errors in other components. The path to an improved model often first requires making the model worse, and stealth features are an important part of this process.

    A second new aspect of the new policy is the development and maintenance of reference solutions. This formalizes the best practices already followed by many developers. E3SM will maintain reference solutions of the E3SM main branch, consisting of 100 year runs of the coupled system for each of the main simulation campaigns (Water Cycle, Cyrosphere, BGC), and similar reference solutions for the component models. These solutions will show the state of the E3SM at all times during the development process, ensure that the main branch always produces a well understood climate, and document the incremental changes to the model.

    The new code review process requires significant investments in E3SM’s testing and diagnostics infrastructure, along with development of documentation on how to evaluate changes with respect to the reference solutions. This documentation will describe the key metrics important to the E3SM science campaigns, as well as many other aspects of the model that are important to maintain or improve. This will be a valuable resource, quantifying much of the expert judgment that goes into assembling each E3SM model release. These reference solutions will make it easier for developers to measure the impact of their new features on many other properties of the E3SM model that may be outside their area of expertise. This work is starting now and should be completed early in the E3SM V3 development process.

    Reference

    Send this to a friend