Brightband has signed a Cooperative Research And Development Agreement (CRADA) Partnership with the National Oceanic and Atmospheric Administration (NOAA).
Under the CRADA, titled “Making NOAA Observation Data Artificial Intelligence-Ready”, Brightband is collaborating with NOAA scientists at the Physical Sciences Laboratory to transform a large archive of observational data from satellites, weather balloons, surface stations, into an open archive designed for geospatial foundation AI models called “NNJA-AI”. Brightband will present its vision for this AI-ready dataset and a preview release of both data and software tools at the American Meteorological Society’s Annual Meeting in New Orleans, LA on January 13.
NOAA, together with NASA, and through the exchange of environmental data with other countries through the World Meteorological Organization (WMO), curates one of the most valuable collections of observation data in the world. The National Centers for Environmental Information (NCEI) holds over 60 petabytes of environmental data today, which is expected to expand to 400 petabytes by 2030. In comparison, the largest estimates of the size of the still enormous training data used to train GPT-4 are 1 petabyte - nowhere near as massive. NOAA and the USA are committed to open data, which, combined with leadership in AI technology, promises to help the USA become a global leader in AI applications for weather and climate modeling.
Brightband is making NOAA’s observational data archive AI-ready by processing it from older, difficult-to-use data formats, into modern, analysis-ready and cloud-optimized ones that enable rapid access to, down-selection of, and processing of data at-scale in the cloud. In much the same way that the European Center for Medium Range Weather Forecasting (ECMWF)’s ERA5 reanalysis catalyzed the first wave of AI weather model development, Brightband hopes that the NNJA-AI dataset will be useful as the foundation of data-driven weather forecasting tools. “As more groups work to use machine learning to improve data assimilation and incorporate observations into weather forecasts, having a single, comprehensive, and easy-to-use dataset will accelerate research efforts” notes Daniel Rothenberg (one of Brightband’s co-founders and the company’s Head of Data and Weather), who is a leader in the atmospheric and climate science data community, having previously helped to build the Pangeo community and toolkit (including libraries like Zarr and xarray) to scalably handle peta-scale datasets. “Our collaboration with NOAA will ensure that the community has access to the best possible dataset to use for this work.”