xCDAT: A Python Package for Simple and Robust Analysis of Climate Data

  • November 25, 2024
  • Releases
  • xCDAT is a modern, simple, and robust python package that addresses the performance needs of climate data analysis.

     

    The Science

    The volume of climate data continues to grow due to a larger pool of data products and increasing spatiotemporal resolution of model and observational data. As a result, analyzing climate data requires highly performant core operations such as reading and writing netCDF files, horizontal and vertical regridding, and spatial and temporal averaging. xCDAT addresses the need for modern and performant analysis software by combining the power of Xarray with geospatial analysis features developed by the Community Data Analysis Tools (CDAT) library.

    The Impact

    xCDAT (Xarray Climate Data Analysis Tools) is an open-source Python package that extends Xarray, a popular Python library for working with multi-dimensional arrays, for routine climate data analysis operations on structured grids. xCDAT aims to promote software sustainability and scientific reproducibility by providing simple, robust, and well-documented features.

    Output based on core xCDAT capabilities

    Summary

    xCDAT’s scope focuses on routine climate research analysis operations such as loading, averaging, and regridding data on structured grids (e.g., rectilinear, curvilinear). Some key features include temporal averaging, geospatial averaging, horizontal regridding, vertical regridding, and robust interpretation and handling of metadata and bounds for coordinates (Fig. 1). xCDAT leverages other powerful packages in the Xarray ecosystem, including xESMF, xgcm, and CF xarray.

    xCDAT aims to promote software sustainability and scientific reproducibility in climate analysis code. It abstracts Xarray boilerplate code for core analysis operations, resulting in code that is more reusable, readable, and less-error prone compared to pure Xarray implementations. xCDAT ensures general compatibility with data regardless of source by operating on datasets that comply with the Climate and Forecast (CF) metadata conventions. It also tackles performance needs by inheriting Dask support through Xarray, which enables users to more efficiently analyze large climate data with parallel computing.

    Since its inception in early 2021, xCDAT has gained widespread adoption throughout the open-source community. It has accumulated over 15,000 total downloads on Anaconda and 100 stars, an indicator of project popularity, on GitHub as of July 2024. There are users from various projects and organizations across the globe, including Earth Exascale Energy System Model (E3SM), Program for Climate Model Diagnosis and Intercomparison (PCMDI), National Aeronautics and Space Administration (NASA), and Institut Pierre-Simon Laplace (IPSL). At Lawrence Livermore National Lab (LLNL), xCDAT and Xarray are becoming staple tools for routine climate research. xCDAT is currently being integrated as a data processing engine within the PCMDI Metrics Package and E3SM Diagnostics Package. It is also included in the E3SM Unified Environment as a tool for post processing and analyzing E3SM data.

    xCDAT’s mission is to serve the needs of the climate science community in the long-term. It is a mission-driven, open-source project that encourages community contributions through the GitHub repository.

    Publication

    • Vo, Tom, Stephen Po-Chedley, Jason Boutte, Jiwoo Lee, and Chengzhu Zhang. 2024. “Xcdat: A Python Package For Simple And Robust Analysis Of Climate Data”. Journal Of Open Source Software 9 (98). The Open Journal: 6426. doi:10.21105/joss.06426.

    Funding

    • This work was supported by the Earth System Model Development program area of the Department of Energy, Office of Science, Biological and Environmental Research program.

    Contact

    • Tom Vo, Lawrence Livermore National Laboratory
    Send this to a friend