DataLabs enable users with differing levels of skills and expertise to work together in an open, transparent and repeatable way to rapidly analyse and visualise a variety of data resources. Researchers making use of DataLabs have already explored a diverse range of environmental science focused challenges, such as habitat extent and condition, species distribution and crop modelling.
By supporting open, collaborative science with end-to-end provenance, DataLabs allow scientists and stakeholders to work together, enabling the discovery of data-driven solutions to current environmental issues.
NERC Environmental Data Service (EDS) researchers, based at the UK Centre for Ecology & Hydrology (UKCEH), have collaborated with a team at Lancaster University to develop a new, NERC-funded, digital research platform known as ‘DataLabs’. With the increasing volume and diversity of data being generated in the field of Environmental Science, it remains a challenge for scientists from different domains, policy makers and the public to work together to tackle the range of environmental problems we face today. DataLabs enable users with differing skillsets and areas of expertise to work together in an open, transparent and repeatable way to rapidly analyse and visualise a variety of data resources. Researchers making use of DataLabs have already explored a diverse range of environmental science focused challenges, such as habitat extent and condition, species distribution and crop modelling. Although currently running as a service on the NERC-funded JASMIN infrastructure, DataLabs can also be set up on other platforms.
DataLabs is a collaborative, scalable and dynamic digital research platform that requires only a web browser to access high-powered computer facilities and big data storage in the cloud. The DataLabs platform allows researchers to disseminate results to stakeholders in an easily accessible format, as well as recording the end-to-end provenance of a particular workflow. The ability to facilitate collaboration and transparency in the iterative scientific process, together with the dynamic and tailorable nature of DataLabs could help reduce the overall cost of research in future years.
One area where DataLabs have already played a prominent role is that of applying change-point analysis to numerical model evaluation. A change-point is a point in a time series where a statistical measurement (e.g. a mean) undergoes a significant change. A DataLab has been developed to compare how well different models capture change-points in air temperature on the Greenland Ice Sheet. An app, developed by the DataLabs team, sits above the more detailed coding environment to allow users, with different levels of expertise to explore the analysis. This is all done via a simple, user-friendly web page.
NERC EDS researchers involved in developing DataLabs, have been showcasing its potential to a variety of users over the past twelve months. Training sessions have been run in collaboration with the Data Science of the Natural Environment (DSNE) project, which have allowed researchers, academics and PhD students funded by NERC and EPSRC to gain experience of how to set up and use a DataLab via the app. Dr. Tom August, a computational ecologist at UKCEH, reports the positive impact DataLabs has had on the work he is involved in on species distribution:
“During lockdown, DataLabs has provided our team with a great way to collaborate across some of our larger analytical projects. So much so, we are now moving our entire analytical workflow over to DataLabs to make use of the collaborative environment, clusters, and big data storage. We are even able to build interactive web applications in the same DataLabs so that others can explore our results”.
DataLabs have also allowed statisticians to explore natural capital mapping in ways they have not been able to previously. Dr. Peter Henrys, a senior statistician at UKCEH describes the impact this new platform has had on his work and that of his team:
"DataLabs have allowed us to achieve a scale of analysis and think more ambitiously than previously possible. This is illustrated by the ability to generate new national-scale natural capital mapping with the facility to explore confidence regionally, temporally and within the statistical models for generating different data products. This was not previously possible and will lead to new investigation of national patterns. DataLabs have also enabled this new analysis to be conducted in a fully collaborative manner such that all members of the project team were able to contribute and follow progress across the whole analytical workflow."
Future development of the DataLabs concept under NERC National Capability and UKRI programmes such as Constructing a Digital Environment will enable environmental scientists, statisticians, data scientists and computer scientists to work together to tackle some of the greatest challenges currently facing the natural world, in a way that has not been possible before. Their use will support responses to acute environmental events and inform our understanding of long-term climate change, for example. By supporting open, collaborative science with end-to-end provenance, DataLabs provide an infrastructure that allows scientists and stakeholders, such as policy makers, to work together in new, transparent ways enabling the discovery of data-driven solutions to current environmental issues.
Paper: Holloway, M. J., Dean, G., Blair, G. S., Brown, M., Henrys, P. A., Watkins, J. (2020) Tackling the challenges of 21st Century open science and beyond: A Data Science Lab approach. Patterns Vol 1, issue 7, 100103. https://doi.org/10.1016/j.patter.2020.100103
DataLabs website: https://datalab.datalabs.ceh.ac.uk/