Brightband launches NNJA-AI dataset

  • Daniel Rothenberg
    Daniel Rothenberg

Share post

Brightband launches NNJA-AI dataset

Brightband is collaborating with NOAA scientists at the Physical Sciences Laboratory to transform a large archive of observational data from satellites, weather balloons, surface stations, and more into an open archive designed for geospatial foundation AI models as part of a partnership with NOAA. Called “NNJA-AI”, this archive reimagines a dataset originally intended for reanalysis development into something that will power a new wave of machine learning weather forecast models (MLWP). Brightband has a Cooperative Research And Development Agreement (CRADA) Partnership with the National Oceanic and Atmospheric Administration (NOAA), “Making NOAA Observation Data Artificial Intelligence-Ready”.

Several years ago, the field of MLWP exploded as AI researchers boot-strapped innovative modeling approaches on the back of a widely available “reanalysis” dataset called ERA5 from the European Center for Medium-Range Weather Forecasting (ECMWF). ERA5 was open access and freely available, and teams such as Google Research published cloud-optimized versions which made the data easy for non-geoscientists to work with. Our aim for NNJA-AI is to enable the same ease of experimentation and development for a new generation of observation-driven MLWP models.

Although recent innovations are starting to enable observation-driven MLWP. Using observational data still presents a myriad of data challenges. Observational data is multi-modal and heterogeneous. There are many different providers who distribute data using different technologies (APIs, FTP/Cloud Buckets), formats (BUFR, HDF5/NetCDF, ASCII, other binary) data standards (ECMWF ParamDB, CF-Conventions, DX BUFR) and licenses (mixture of open and restricted commercial). Curating observational data for AI/ML tasks is a complex engineering task, which is an impediment to rapid research progress.

We believe that FAIR and Open Data benefit greatly from modern, easy-to-use data tools and software. As part of NNJA-AI, we aim to process all of the original NNJA dataset (currently published on AWS S3) in a contemporary, cloud-native format, and aim to keep it up to date as the archive is improved by NOAA scientists. We are also providing open source tools to help developers and scientists explore the archive and integrate this data into their own workflows.

You can now access Brightband’s preview release of a curated dataset of weather observations and software tools specifically tailored for the needs of AI weather forecasting applications at brightband.com/data.

The NNJA-AI archive is published under a CC-BY 4.0 license

One of the advantages of having such an extensive dataset available with modern data tooling and software, is that producing pretty pictures becomes easy! 💯

Here is one look at some of the NNJA-AI Observational data.

NNJA-AI Sample Visualization

Note - NOAA NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis. The NNJA is pronounced “Nin-jah” 🥷