Centre for Environmental Data Analysis
governmentHarwell, United Kingdom
Research output, citation impact, and the most-cited recent papers from Centre for Environmental Data Analysis. Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Centre for Environmental Data Analysis
Abstract. The Coupled Model Intercomparison Project (CMIP) has successfully provided the climate community with a rich collection of simulation output from Earth system models (ESMs) that can be used to understand past climate changes and make projections and uncertainty estimates of the future. Confidence in ESMs can be gained because the models are based on physical principles and reproduce many important aspects of observed climate. More research is required to identify the processes that are most responsible for systematic biases and the magnitude and uncertainty of future projections so that more relevant performance tests can be developed. At the same time, there are many aspects of ESM evaluation that are well established and considered an essential part of systematic evaluation but have been implemented ad hoc with little community coordination. Given the diversity and complexity of ESM analysis, we argue that the CMIP community has reached a critical juncture at which many baseline aspects of model evaluation need to be performed much more efficiently and consistently. Here, we provide a perspective and viewpoint on how a more systematic, open, and rapid performance assessment of the large and diverse number of models that will participate in current and future phases of CMIP can be achieved, and announce our intention to implement such a system for CMIP6. Accomplishing this could also free up valuable resources as many scientists are frequently "re-inventing the wheel" by re-writing analysis routines for well-established analysis methods. A more systematic approach for the community would be to develop and apply evaluation tools that are based on the latest scientific knowledge and observational reference, are well suited for routine use, and provide a wide range of diagnostics and performance metrics that comprehensively characterize model behaviour as soon as the output is published to the Earth System Grid Federation (ESGF). The CMIP infrastructure enforces data standards and conventions for model output and documentation accessible via the ESGF, additionally publishing observations (obs4MIPs) and reanalyses (ana4MIPs) for model intercomparison projects using the same data structure and organization as the ESM output. This largely facilitates routine evaluation of the ESMs, but to be able to process the data automatically alongside the ESGF, the infrastructure needs to be extended with processing capabilities at the ESGF data nodes where the evaluation tools can be executed on a routine basis. Efforts are already underway to develop community-based evaluation tools, and we encourage experts to provide additional diagnostic codes that would enhance this capability for CMIP. At the same time, we encourage the community to contribute observations and reanalyses for model evaluation to the obs4MIPs and ana4MIPs archives. The intention is to produce through the ESGF a widely accepted quasi-operational evaluation framework for CMIP6 that would routinely execute a series of standardized evaluation tasks. Over time, as this capability matures, we expect to produce an increasingly systematic characterization of models which, compared with early phases of CMIP, will more quickly and openly identify the strengths and weaknesses of the simulations. This will also reveal whether long-standing model errors remain evident in newer models and will assist modelling groups in improving their models. This framework will be designed to readily incorporate updates, including new observations and additional diagnostics and metrics as they become available from the research community.
Abstract. The Community Intercomparison Suite (CIS) is an easy-to-use command-line tool which has been developed to allow the straightforward intercomparison of remote sensing, in situ and model data. While there are a number of tools available for working with climate model data, the large diversity of sources (and formats) of remote sensing and in situ measurements necessitated a novel software solution. Developed by a professional software company, CIS supports a large number of gridded and ungridded data sources "out-of-the-box", including climate model output in NetCDF or the UK Met Office pp file format, CloudSat, CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization), MODIS (MODerate resolution Imaging Spectroradiometer), Cloud and Aerosol CCI (Climate Change Initiative) level 2 satellite data and a number of in situ aircraft and ground station data sets. The open-source architecture also supports user-defined plugins to allow many other sources to be easily added. Many of the key operations required when comparing heterogenous data sets are provided by CIS, including subsetting, aggregating, collocating and plotting the data. Output data are written to CF-compliant NetCDF files to ensure interoperability with other tools and systems. The latest documentation, including a user manual and installation instructions, can be found on our website (http://cistools.net). Here, we describe the need which this tool fulfils, followed by descriptions of its main functionality (as at version 1.4.0) and plugin architecture which make it unique in the field.
Abstract The emergence of exascale computing and artificial intelligence offer tremendous potential to significantly advance Earth system prediction capabilities. However, enormous challenges must be overcome to adapt models and prediction systems to use these new technologies effectively. A 2022 WMO report on exascale computing recommends “ urgency in dedicating efforts and attention to disruptions associated with evolving computing technologies that will be increasingly difficult to overcome, threatening continued advancements in weather and climate prediction capabilities .” Further, the explosive growth in data from observations, model and ensemble output, and postprocessing threatens to overwhelm the ability to deliver timely, accurate, and precise information needed for decision-making. Artificial intelligence (AI) offers untapped opportunities to alter how models are developed, observations are processed, and predictions are analyzed and extracted for decision-making. Given the extraordinarily high cost of computing, growing complexity of prediction systems, and increasingly unmanageable amount of data being produced and consumed, these challenges are rapidly becoming too large for any single institution or country to handle. This paper describes key technical and budgetary challenges, identifies gaps and ways to address them, and makes a number of recommendations. Significance Statement Earth system modeling and prediction stands at a crossroad. Exascale computing and artificial intelligence (AI) offer powerful new capabilities to advance Earth system predictions. However, models, assimilation, and data processing systems are increasingly unable to exploit these new technologies due to scientific, software, and computational limitations. Significant changes to the models including algorithms, software, and parallelism are needed to run models efficiently on diverse exascale systems. While AI offers significant potential, it is unclear the degree it can be developed and integrated into existing prediction systems. We recommend models be redesigned, linking science, software, and computing in codesign efforts to fully exploit exascale and AI. Special efforts are needed to recruit, train, and retain a highly skilled, interdisciplinary workforce. Given the high cost, shared computing and data facilities may become necessary.
Abstract. In developing methods for convective-scale data assimilation (DA), it is necessary to consider the full range of motions governed by the compressible Navier–Stokes equations (including non-hydrostatic and ageostrophic flow). These equations describe motion on a wide range of timescales with non-linear coupling. For the purpose of developing new DA techniques that suit the convective-scale problem, it is helpful to use so-called toy models that are easy to run and contain the same types of motion as the full equation set. Such a model needs to permit hydrostatic and geostrophic balance at large scales but allow imbalance at small scales, and in particular, it needs to exhibit intermittent convection-like behaviour. Existing toy models are not always sufficient for investigating these issues. A simplified system of intermediate complexity derived from the Euler equations is presented, which supports dispersive gravity and acoustic modes. In this system, the separation of timescales can be greatly reduced by changing the physical parameters. Unlike in existing toy models, this allows the acoustic modes to be treated explicitly and hence inexpensively. In addition, the non-linear coupling induced by the equation of state is simplified. This means that the gravity and acoustic modes are less coupled than in conventional models. A vertical slice formulation is used which contains only dry dynamics. The model is shown to give physically reasonable results, and convective behaviour is generated by localised compressible effects. This model provides an affordable and flexible framework within which some of the complex issues of convective-scale DA can later be investigated. The model is called the ABC model after the three tunable parameters introduced: A (the pure gravity wave frequency), B (the modulation of the divergent term in the continuity equation), and C (defining the compressibility).
The information provided in the Intergovernmental Panel on Climate Change (IPCC; <a href="http://ipcc.ch" target="_blank">http://ipcc.ch</a>) Assessment Reports (ARs) inform climate change policy development. Within the IPCC the scientific coordination of the ARs is conducted by three Working Groups (WGs) comprising of the Bureaus supported by their Technical Support Units (TSUs). Data management support is provided by the IPCC Data Distribution Centre (DDC; <a href="http://ipcc-data.org" target="_blank">http://ipcc-data.org</a>), which is overseen by the Task Group on Data Support for Climate Change Assessments (TG-Data; formerly TGICA). The DDC is a federated structure that is currently managed by the Centre for Environmental Data Analysis (CEDA; <a href="http://www.ceda.ac.uk/" target="_blank">http://www.ceda.ac.uk/</a>), United Kingdom; the World Data Center for Climate (WDCC; <a href="http://www.wdc-climate.de" target="_blank">http://www.wdc-climate.de</a>), Germany; and the Center for International Earth Science Information Network (CIESIN; <a href="http://www.ciesin.columbia.edu/" target="_blank">http://www.ciesin.columbia.edu/</a>) at Columbia University, U.S. For the IPCC Sixth Assessment cycle (AR6), analyses of climate simulations and observations published in scientific literature will be assessed. The reports will include figures and tables prepared from the underlying digital information. The DDC plays an increasingly important role in facilitating the exchange of data, as well as curating the assessed datasets, scripts and provenance records to facilitate the assessment process and to support the traceability of AR6 results through long-term continuity of data management and curation. These issues, among others, are addressed by the DDC support group (<a href="https://cedadev.github.io/ipcc_ddc" target="_blank">https://cedadev.github.io/ipcc_ddc</a>) currently consisting of members from the three TSUs and the three DDC managers.
Abstract The “signal‐to‐noise paradox” for seasonal forecasts of the winter North Atlantic Oscillation (NAO) is often described as an “underconfident” forecast and measured using the ratio‐of‐predictable components (RPCs) metric. However, comparison of RPC with other measures of forecast confidence, such as spread‐error ratios, can give conflicting impressions, challenging this informal description. We show, using a linear statistical model, that the “paradox” is equivalent to a situation where the reliability diagram of any percentile forecast has a slope exceeding 1. The relationship with spread‐error ratios is shown to be far less direct. We furthermore compute reliability diagrams of winter NAO forecasts using seasonal hindcasts from the European Centre for Medium‐range Weather Forecasts and the UK Meteorological Office. While these broadly exhibit slopes exceeding 1, there is evidence of asymmetry between upper and lower terciles, indicating a potential violation of linearity/Gaussianity. The limitations and benefits of reliability diagrams as a diagnostic tool are discussed.
The 2020-2021 annual report for the Centre for Environmental Data Analysis (CEDA). This report presents key statistics for the year past (2020 - 2021) as well as a series of snapshots of activity, expressed as short highlights and short reports.
Abstract The Centre for Environmental Data Analysis (CEDA) is a provider of two major services to the environmental science community; JASMIN and the CEDA Archive. CEDA is frequently required to evidence the impact it has on researchers and wider society. However, this is challenging as there are currently no formal or standard processes for collecting impact information. To understand how CEDA could collect impact information, and to allow its users to shape this monitoring, over 500 users provided their opinions, preferences and suggestions as to how to share impact, via six focus groups and an online survey. The results suggest that whilst there was a high degree of willingness to provide impact information to CEDA there remains confusion around what ‘impact’ is. Users are keen to share impact in ways which utilize existing processes, and at times which make sense to both the research and the impact, whilst also understanding the need and purpose for sharing that information.
Abstract. The PRIMAVERA project aimed to develop a new generation of advanced and well-evaluated high-resolution global climate models. As part of PRIMAVERA, seven different climate models were run in both standard and higher-resolution configurations, with common initial conditions and forcings to form a multi-model ensemble. The ensemble simulations were run on high-performance computers across Europe and generated approximately 1.6 PiB (pebibytes) of output. To allow the data from all models to be analysed at this scale, PRIMAVERA scientists were encouraged to bring their analysis to the data. All data were transferred to a central analysis facility (CAF), in this case the JASMIN super-data-cluster, where it was catalogued and details made available to users using the web interface of the PRIMAVERA Data Management Tool (DMT). Users from across the project were able to query the available data using the DMT and then access it at the CAF. Here we describe how the PRIMAVERA project used the CAF's facilities to enable users to analyse this multi-model dataset. We believe that PRIMAVERA's experience using a CAF demonstrates how similar, multi-institute, big-data projects can efficiently share, organise and analyse large volumes of data.
The need to apply complex algorithms on large volumes of data is boosting the development of technological solutions able to satisfy big data analytics needs in Cloud and HPC environments. In this context Ophidia represents a big data analytics framework for eScience offering a cross-domain solution for managing scientific, multi-dimensional data. It also exploits an in-memory-based distributed data storage and provides support for the submission of complex workflows by means of various interfaces compliant to well-known standards. This paper presents some applications of Ophidia for the computation of climate indicators defined in the CLIPC project, the WPS interface used for the submission and the workflow based approach employed.
We introduce the rationale for, and architecture of, the European Space Agency Climate Change Initiative (CCI) Open Data Portal (<a href="http://cci.esa.int/data/" target="_blank">http://cci.esa.int/data/</a>). The Open Data Portal hosts a set of richly diverse datasets – 13 “Essential Climate Variables” – from the CCI programme in a consistent and harmonised form and to provides a single point of access for the (>100 TB) data for broad dissemination to an international user community. These data have been produced by a range of different institutions and vary across both scientific and spatio-temporal characteristics. This heterogeneity of the data together with the range of services to be supported presented significant technical challenges. An iterative development methodology was key to tackling these challenges: the system developed exploits a workflow which takes data that conforms to the CCI data specification, ingests it into a managed archive and uses both manual and automatically generated metadata to support data discovery, browse, and delivery services. It utilises both Earth System Grid Federation (ESGF) data nodes and the Open Geospatial Consortium Catalogue Service for the Web (OGC-CSW) interface, serving data into both the ESGF and the Global Earth Observation System of Systems (GEOSS). A key part of the system is a new vocabulary server, populated with CCI specific terms and relationships which integrates OGC-CSW and ESGF search services together, developed as part of a dialogue between domain scientists and linked data specialists. These services have enabled the development of a unified user interface for graphical search and visualisation – the CCI Open Data Portal Web Presence.
Abstract. The Community Intercomparison Suite (CIS) is an easy-to-use command-line tool which has been developed to allow the straightforward intercomparison of remote sensing, in-situ and model data. While there are a number of tools available for working with climate model data, the large diversity of sources (and formats) of remote sensing and in-situ measurements necessitated a novel software solution. Developed by a professional software company, CIS supports a large number of gridded and ungridded data sources "out-of-the-box", including climate model output in NetCDF or the UK Met Office pp file format, CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization), MODIS (MODerate resolution Imaging Spectroradiometer), Cloud and Aerosol CCI (Climate Change Initiative) level 2 satellite data, and a number of in-situ aircraft and ground station datasets. The open-source architecture also supports user defined "plugins" to allow many other sources to be easily added. Many of the key operations required when comparing heterogenous datasets are provided by CIS, including subsetting, aggregating, collocating and plotting the data. Output data is written to CF-compliant NetCDF files to ensure interoperability with other tools and systems. The latest documentation, including a user manual and installation instructions can be found on our website (http://cistools.net). Here we describe the need which this tool fulfils, followed by descriptions of its main functionality (as at version 1.3.2) and plugin architecture which make it unique in the field.
Many of the figures in the WGI contribution to the IPCC Sixth Assessment report (AR6) are derived from the data of multiple CMIP6 simulations. &#160;For instance, a plot showing projections of global temperature change in Figure 2 of Chapter 4 of the IPCC AR6 is based on data from 183 CMIP6 simulation datasets. The figure helpfully tells us which CMIP6 experiments were used as input data but does not provide information about the models that ran the simulations. It is possible to deduce the specific input data from supplementary tables in the IPCC assessment report and from within the report&#8217;s annexes.&#160; However, these information sources are not machine-accessible so are difficult to use for tracing purposes, and they are not sufficient to give credit as they do not enter indexing services, and they are difficult to find as they are not part of the printed report. Even if we gather this knowledge to create a navigable provenance network for the figure, we are still left with the unwieldy prospect of rendering 183 data citations for an outwardly simple plot.We require a compact way to provide traceable provenance for large input data networks that makes transparent the specific input data used to create the CMIP6-based figures in IPCC AR6 and gives credit to modelling centres for the effort of running the simulations. The so-called complex citation discussed within the RDA Complex Citation Working Group.&#160;We present a pragmatic solution to the complex citation challenge that uses an existing public infrastructure technology, Zenodo. &#160;The work establishes traceability by collating references to a figure&#8217;s input datasets within a Zenodo record and credit via Zenodo&#8217;s relatedWorks feature/DataCite&#8217;s relations which link to existing data objects through Persistent Identifiers (PIDs), in this case the CMIP6 data citations. &#160; Whilst a range of PIDs exist to support connection between objects, the use of DOIs is widely used for citations and is well connected within the wider PID graph landscape and Zenodo provides a tool to create objects that utilise the DOI schema provided by DataCite. &#160;CMIP6 data citations have sufficient granularity to assign credit, but the granularity is not fine enough for traceability purposes, therefore Zenodo reference handle groups are used to identify specific input datasets and Zenodo connected objects provide the join between them.There is still work to be done to establish full visibility of credit referenced within the Zenodo records. &#160;However, we hope to engage the community by presenting our pragmatic solution to the complex citation challenge, one that has the potential to provide modelling centres with a route to a more complete picture of the impact of their simulations.
Within the Environmental 'omics community Bio-Linux is a widely used tool. This has the advantage of providing in a single deliverable package all necessary software and tools to support common analyses. With the growth in data volumes within the community and increasing constraints on user access and control over their own desktops an alternative delivery method of Bio-Linux and, in future, the Docker container environment is necessary. Within the EOS Cloud project we have constructed a Desktop as a Service system to centrally host virtual machines with these tools preconfigured and maintained. To enable efficient use of the resources we have enabled user controlled resource scaling so that users are able to utilise small scale VMs for task configuration and data manipulation and boost to a larger scale to run analysis applications all the while maintaining the user environment in a consistent manner. Alongside this within the project we have been developed tools to simplify the increasingly popular Docker software usage model. This includes ensure uniformity of behaviour between the host system and the running Docker container. Within the invitation only trial user community we identify two different exemplars groups and explain their usage and how the products and services developed within the project are useful for them. We conclude discussing the useful nature of Desktop as a Service, how it is of great benefit to the bioinformatics community but could also be of great use elsewhere, where the need for a stable user environment with applications already available that do not rely on local ICT support.
Abstract While a significant amount of attention surrounding climate change has focused on mitigation of the causes, there is growing interest and need to adapt to physical climate change impacts which are already being experienced and in anticipation of future changes. Changes in climate have the potential to create hazards in the oil and gas sector although vulnerabilities to these changes are often specific to asset types. Preparedness for climate change can help to reduce damaging effects from acute as well as chronic climate changes. This paper focuses on a simple approach developed to ensure that climate change is included in engineering design, by considering climate change risk and the uncertainty inherent in future projections of climate change into design requirements. It involves using the best available climate change data and an understanding of the relationships between asset performance and environmental (climate-related) conditions. The risk level associated with climate change for a specific asset is determined by consideration of the severity and confidence level of the climate change hazard, the exposure of the asset to the hazard, the vulnerability of the exposed asset to the hazard and the capacity of the asset to adapt to the hazard. The method considers the risk levels, the selection of climate model data, the ‘natural variability’ baseline period to be applied to the climate change data, the climate change model validation, the asset life time and specifically how to modify metocean design criteria to account for climate change to ensure both the ‘start of life’ criteria (typically derived from observed and hindcast data) and ‘end of life’ criteria (including an estimate for the impact of climate change at the end of the asset life) meet the required annual probability of exceedance.
The Community Intercomparison Suite (CIS) is an easy-to-use command-line tool which has been developed to allow the straightforward intercomparison of remote sensing, in-situ and model data. Developed by a professional software company, CIS supports a large number of gridded and ungridded data sources `out-of-the-box', including climate model output in NetCDF or pp file format, CALIOP, MODIS, Cloud and Aerosol CCI level 2 satellite data, and a number of other in-situ aircraft and ground station datasets. The open-source architecture also supports user defined `plugins' to allow many other sources to be easily added. Many of the key operations required when comparing heterogenous datasets are provided by CIS, including subsetting, aggregating, collocating and visualising the data. Output data is written to CF-compliant NetCDF files to ensure interoperability with other tools and systems. The latest documentation, including a user manual and installation instructions can be found on our website (http://cistools.net).
<strong class="journal-contentHeaderColor">Abstract.</strong> The PRIMAVERA project aimed to develop a new generation of advanced and well-evaluated high-resolution global climate models. As part of PRIMAVERA, seven different climate models were run in both standard and higher resolution configurations, with common initial conditions and forcings to form a multi-model ensemble. The ensemble simulations were run on high performance computers across Europe and generated approximately 1.6 pebibytes of output. To allow the data from all models to be analysed at this scale, PRIMAVERA scientists were encouraged to bring their analysis to the data. All data was transferred to a Central Analysis Facility (CAF), in this case the JASMIN super-data-cluster, where it was catalogued and details made available to users using the PRIMAVERA Data Management Tool's (DMT's) web interface. Users from across the project were able to query the available data using the DMT and then access it at the CAF. Here we describe how the PRIMAVERA project used the CAF's facilities to enable users to analyse this multi-model data set. We believe that PRIMAVERA's experience using a CAF demonstrates how similar, multi institute, big-data projects can efficiently share, organise and analyse large volumes of data.
This report is a deliverable for the AMPLIFY-EDS project. It summarises the key activities undertaken by the Environmental Data Service (EDS) at the NERC Digital Gathering and provides overall feedback and recommendations on the event. The purpose of this report is to: • Provide a citeable output from the AMPLIFY-EDS project• Inform future EDS direction based on feedback, lessons learnt and discussions with users• Inform NERC Digital Gathering event organisers of lessons learnt from the perspective of the EDS
The UKRI Net Zero DRI Scoping Project produced evidence-based recommendations. This dataset provides an interactive way to list all of the recommendations and actions from the project. These files have been provided to show the mapping and links between the specific recommendations and the different strategic themes and roadmap actions proposed by the project. Three file formats have been provided for ease of use: Excel, PDF, CSV. The recommendations, if opened in a spreadsheet, can be sorted or filtered based on a range of interest areas e.g. type of evidence, delivery pathway, strategic theme, delivery year. There are two tabs in the spreadsheet. The first tab shows the recommendations discussed in the 'toolkit' section of the project's final technical report. The second tab shows actions discussed in the 'roadmap' section of the report. For full details about the evidence that underpins these recommendations, please see technical report found here: https://doi.org/10.5281/zenodo.8199984 The authors cited here are those who produced the files uploaded. Many other contributors are named in the technical report and is a synthesis of their work. Please also cite the technical report if you use this dataset.
This presentation gives an overview of the Natural Environment Research Council - Environmetal Data Service Centre for Environmental Data Analysis (NERC-EDS CEDA) archive, presented to the UrbanAir project group, 26th November 2025. A brief description of the CEDA archive and JASMIN analysis platform is given, followed by examples of Air Quality (AQ) datasets held at CEDA.