Summary of the E3SM Leadership Meeting
E3SM Leadership Team (E3SM Executive Committee, Group Leads and Deputies, and the Program Manager) held a very productive in-person held in Bethesda, MD, on November 7-8, 2023 to discuss the current status of the project and future plans. Particular focus was given to long-term strategy and project support, like communication, documentation, tutorial workshop planning, code review implementation, and the project’s best practice standards.
Dave Bader opened the meeting with “The State of our Project” address, which summed up the project’s many accomplishments but then immediately switched focus to concentrate on the question “What’s next?”: the project’s future short- and long-term goals and deliverables, and the way to achieve them (which starts with re-alignment and updates to phase 3 roadmaps), and defining the project’s 10-year vision and strategy for the next phases of the project.
The updated 1- and 3-year roadmaps for each group were then presented by their Group Leads, noting any changes or realignment. Next staffing plans for phase 3 of the project were discussed by the Lab Points of Contacts (POCs) to address the expected flat funding.
The next session was on support issues, such as communication and website needs, and the strategy on how to fulfill them, presented by Renata McCoy, Project Engineer and the Lead of the Communication Team. One of the ways to better fulfill the support needed by the growing ecosystem projects (the projects funded by the DOE BER to work on some problems related to the E3SM) is to establish an in-person training session. Jill Zhang, the Infrastructure group Deputy Lead, gave an update on the planning of the first hands-on tutorial workshop at NERSC, scheduled for May 2024. Another support-related talk was given by Rob Jacob, the Infrastructure Group Lead, on standards established for documentation that require the documentation to be developed together with the code, be versioned together according to the code version, and live next to the code (for example, component documentation should be stored in GitHub with that component). Rob reminded everyone that the documentation should be written in markdown format, he explained different types of documentation (User’s Guide, Developer’s Guide, and Technical Guide) and discussed the status of the documentation and a lot of work needed to meet the project’s objectives in this regard.
To prevent issues when adding new code to the model source code, last year the project spent a substantial amount of time revising and developing a strict process for Code Review and New Feature Additions. The process is (necessarily) arduous and time-consuming, hence it can be difficult to complete, which may be the reason why the project is still noting some issues with compliance. Mark Taylor, the Chief Computational Scientist reviewed the process and brought attention to some tasks that are being implemented and need completion before the full Code Review process can be followed. The next series of talks was on the project’s best practice standards and examples of what can go wrong. Rob Jacob discussed development best practices that everyone at E3SM is expected to follow, i.e: 1) new feature development needs to start with the planning and documentation, 2) the new code needs to be developed in a fork of the repository, kept up to date and not merged to master, 3) on the development machine clone the fork to make a feature branch, 4) before making changes, create a baseline solution for future testing, 5) work on your feature by adding commits to your branch, 6) push the branch to Github, 7) test, test, test, 8) when finished, make a “Pull Request” (PR) from a fork to the upstream repository, pay attention to the standard message in the PR, and assign a reviewer.
The project has also well-developed standards for simulation best practices which were presented by Chris Golaz, the Coupled System Group Lead. He talked about all the steps that need to be taken before running the simulation, like how to configure the model according to the project standards and the need to run a short test. When the simulation is running, the comprehensive documentation should be developed following the project’s templates, the job should be monitored with PACE and crashes should be gracefully recovered. Short-term archiving should always be used before imploying post-processing with zppy and long-term archiving with zstash to NERSC. Finally, the space should be freed up by cleaning the scratch space when no longer needed.
The final talk in this session was an invigorating story about the forensics of some deeply hidden bugs given by Andrew Roberts, the Deputy Lead of the Polar Processes, Sea-Level Rise, and Coastal Impacts Group. He discussed three cases to illustrate why rigorous coupled code review is essential, even though it is difficult and quite time-consuming. He postulated that “The Code Review Process is inexpensive compared to the alternative”, and the cost of a coded scientific mistake is far higher than the cost of finding it before it enters the code. When chasing the bugs he encourages keeping objectivity, diplomacy, and patience!
The second day was devoted to discussions about the strategy and the vision with an eye on the far future over the next decade. The presentations were provided by both DOE (Dorothy Koch, Gary Geernaert, Xujing Davis) and the E3SM Executive Committee giving their perspectives.
The presentations were followed by a long, free discussion around the strategies for future E3SM developments, mission, and vision. This was one of many planned strategizing brainstorming sessions, the leadership team will be meeting again, probably at least a few more times before a crystalized vision comes to a sharp focus.
Overall the meeting was a profound success, all agreed that it was very fruitful and that we should have such meetings more frequently.
—
- Some links are internal
This article is a part of the E3SM “Floating Points” Newsletter, to read the full Newsletter check: