Marine science in the cloud: Knauss fellow assists with data policy
The National Oceanic and Atmospheric Administration hosts more than 37 petabytes of data—containing everything from satellite images to fish genetics. Each petabyte stores roughly 20 million filing cabinets’ worth of information.
Making these datasets more widely available has been a priority for the federal government, and NOAA has one of the largest data archives of all the federal agencies. As a Knauss Fellow working with the NOAA’s chief data officer, Chase Long worked on the data management strategy for the agency.
During his fellowship, Long learned about recent federal policies driving NOAA’s efforts to make their data FAIR—findable, accessible, interoperable, and reusable—like the Big Data Project. Through the project, cloud service providers agreed to host NOAA’s popular datasets and make them available to the public.
“Basically, policies directed agencies to treat their data as an asset, rather than just treating it as a byproduct that comes out of everything else they do,” Long said.
Although NOAA is responsible for monitoring climate, it is also part of the Department of Commerce, whose mission is to strengthen the economy. For example, NOAA already shares weather data with organizations that provide local weather forecasts.
“Companies like the Weather Channel produce ways of connecting with users that aren’t really in the wheelhouse of a federal agency like NOAA to produce,” Long said. “We want to see more innovation on ocean data and see what kind of demand there is in the public for novel products based on the data NOAA already collects.”
Previously, researchers or private companies could access datasets through requests to the agency, and some datasets were available through online data portals. But some information can be hard to find unless you know where to look or who to contact, Long said.
Even for researchers who know how to find datasets, the information can take a long time to get. At NOAA’s National Centers for Environmental Information, more than a million terabytes of data are stored on servers. Copying just the NEXRAD weather radar archives off of the servers, for example, could take weeks.
“If you wanted to use that for a study or something, you would have to wait for a whole month just to get the entire archive,” Long said. “Then, if someone else came along and wanted to use it, NCEI would have to go through that whole process again because it’s on cold storage.”
After NOAA moved the NEXRAD data archive to the cloud, anyone could access the entire archive immediately. Already, researchers have used satellite images along bird migration routes to show how climate change has altered bird activity. One company has even developed an algorithm that rapidly detects wildfires by scanning NOAA datasets.
At NOAA’s National Centers for Environmental Information, more than a million terabytes of data are stored on servers.
“The hope is that this will lead to more scientific breakthroughs more rapidly, it will lead to more experimentation,” Long said. “Grad students will have an easier time finding datasets that might be of interest to the work that they’re doing. Companies can start to innovate.”
As a Knauss Fellow, Long helped coordinate events like the ocean data roundtable meeting held in February 2020. He also worked with NOAA offices to plan how different parts of the agency could apply the federal policies to their research and data. Some datasets cannot be shared, for example. Certain fisheries datasets are protected because they contain private business information. However, other parts of the agency that deal with weather routinely share satellite images with weather forecasters.
Data points are most useful twice: the first is immediately after it is recorded, and gradually, they regain value as the information accumulates into a long-running record. Making these records easy-to-find and easy-to-use will keep NOAA on the forefront of open data policies, Long said.
“I really have developed a respect and appreciation for NOAA’s researchers and mission,” Long said. “And because of that mission, people care about making the data that this agency collects available.”
- As a Knauss Fellow working with the NOAA’s chief data officer, Chase Long worked on the data management strategy for the agency, coordinating events and planning how different parts of NOAA could apply federal policies to their data.
- During his fellowship, Long learned about recent federal policies driving NOAA’s efforts to make their data FAIR—findable, accessible, interoperable, and reusable—like the Big Data Project.
- More accessible data will help spur innovations and new scientific breakthroughs.
Photos and Video by Aileen Devlin | Virginia Sea Grant
Published Jan. 6, 2021.
“The hope is that this will lead to more scientific breakthroughs more rapidly,” Long said.