As part of a recent NERC EDS (Environmental Data Service) project High5 for AI, we have enabled new ways of finding and accessing some of our high-value datasets. This includes the production of new metadata files and creation of notebooks that demonstrate new data access workflows. Here are some examples of how we have made key datasets more accessible: 

  • The British Oceanographic Data Centre (BODC) has now made all their holdings of water column CTD (Conductivity, Temperature, Depth) profile data available through a federated ERDDAP server.  
  • The National Geoscience Data Centre (NGDC) built Application programming Interfaces (APIs) on dataset holdings including offshore sample data, digital mapping & borehole materials. 
  • The Centre for Environmental Data Analysis (CEDA) promoted several relevant datasets including the European Space Agency (ESA) Climate Change Initiative (CCI) soil-moisture dataset and highlighted the use of the SpatioTemporal Asset Catalog (STAC) for data discover & access. 
  • The Environmental Information Data Centre (EIDC) built a custom data loader that prepares an array-like object from their landcover dataset for use in a Machine Learning (ML) model workflow. 

While previously all these data were available through the data centre websites, these new delivery mechanisms should be more user-friendly and integrate far more seamlessly with software systems and onward data pipelines. 

In addition to exposing the data itself, we have improved data Artificial Intelligence (AI) readiness by providing Croissant format metadata for each of these datasets. This format describes the structure and content of the data along with metadata allowing machines to access, download, and read the data. This optimises our data to be ingested into machine learning pipelines, simplifying the process and saving time for data scientists. 

The High5 for AI project aligned practices across the EDS data centres, adopting Croissant as a metadata standard for accessing datasets. Providing this consistent interface for accessing and loading datasets across all environmental domains is a key objective of the EDS and will facilitate users in addressing broader environmental questions. This enables standardised read/write utilities and will also enable creation of standardised tools for quality assurance and validation of data supplied as well as standardised visualisation of these data (e.g. via dashboard) regardless of specific data contents.

Image
A screenshot of the search bar of the ERDDAP site

A screenshot of the ERDDAP search function.