GOOGLE CLOUD
DEVELOPER KIT
As part of a Cooperative Research and Devlopment Agreement with NOAA, Brightband is building AI-ready observational datasets to power a new generation of machine learning-based weather and climate prediction tools. The first dataset we’re launching as part of this partnership is a re-processed version of the NOAA-NASA Joint Archive (NNJA) of Observations for Earth System Reanalysis. Initially designed to support R&D focused on improving weather and climate reanalysis modeling, the NNJA archive is an ideal dataset for developing observation-driven weather forecasting tools, as it includes a wide cross-section of data from a plethora of sensing platforms (satellites, surface stations, weather balloons, and more) and features data from 1979 to the present.
The original NNJA dataset was published in BUFR format, which is difficult and awkward to work with. NNJA-AI (pronounced “Ninja AI”) greatly simplifies this by providing a well-structured archive of the data re-processed into a contemporary, analysis-ready, cloud-optimized tabular format that can be easily integrated into any workflow a user might bring to bear on the data. This dataset comprises the backbone of Brightband’s R&D targeting machine learning-powered data assimilation and weather forecasting techniques.
The first major release of this dataset, version 1.0, is now available on Google Cloud Storage at gs://nnja-ai, hosted with support from the NOAA Open Data Dissemination (NODD) Program. It contains the complete, re-processed record of over a dozen sensors onboard a mixture of geostationary and low-earth orbit satellites as well as conventional data from surface stations and radiosondes. You can access the structured, Hive-partitioned Parquet files comprising the archive directly, or you can use our Python-based Software Developer Kit, available on PyPI or directly from GitHub. For more details including demonstration notebooks, check out our documentation.
The chart below shows the temporal coverage for each observation type included in the dataset:

We’d love to learn more about how you’re using this data! Please drop us a line at hello@brightband.com… we plan to continue expanding this dataset and would be happy to learn about ways we can support your use cases.