Environmental Information Data Centre
archiveLancaster, United Kingdom
Research output, citation impact, and the most-cited recent papers from Environmental Information Data Centre (United Kingdom). Aggregated across the NobleBlocks index of 300M+ scholarly works.
Top-cited papers from Environmental Information Data Centre
Abstract Aim Integrated species distribution modelling has emerged as a useful tool for ecologists to exploit the range of information available on where species occur. In particular, the ability to combine large numbers of ad hoc or presence‐only (PO) records with more structured presence–absence (PA) data can allow ecologists to account for biases in PO data which often confound modelling efforts. A range of modelling techniques have been suggested to implement integrated species distribution models (IDMs) including joint likelihood models, including one dataset as a covariate or informative prior, and fitting a correlation structure between datasets. We aim to investigate the performance of different types of integrated models under realistic ecological data scenarios. Innovation We use a virtual ecologist approach to investigate which integrated model is most advantageous under varying levels of spatial bias in PO data, sample size of PA data and spatial overlap between datasets. Main conclusions Joint likelihood models were the best performing models when spatial bias in PO data was low, or could be modelled, but gave poor estimates when there were unknown biases in the data. Correlation models provided good model estimates even when there were unknown biases and when good quality PA data were spatially limited. Including PO data via an informative prior provided little improvement over modelling PA data alone and was inferior to using either the joint likelihood or correlation approach. Our results suggest that correlation models provide a robust alternative to joint likelihood models when covariates related to effort or detection in PO data are not available. Ecologists should be aware of the limitations of each approach and consider how well biases in the data can be modelled when deciding which type of IDM to use.
Nitrogen (N) and phosphorus (P) are essential nutrients necessary for plant growth and support life in aquatic ecosystems. However, excessive N and P can lead to algal blooms that deplete oxygen and lead to fish death and the release of toxins that are harmful to humans. Estimates of N and P levels in rivers are typically calculated at station or grid (>1 km) scale; therefore, it is difficult to visualise the evolution of water quality as water travels downstream. Using a high-resolution reach-scale river network and associating each reach with land cover fractions and catchment descriptors, we trained random forest models on aggregated data (2010–2020) from the Environmental Agency Open Water Quality Data Archive for 2,343 stations to predict long-term nitrate and orthophosphate concentrations at each river reach in Great Britain (GB). We separated the model training and predictions for different seasons to investigate the potential difference in feature importance. Our model predicted concentrations with an average testing coefficient of determination ( R 2 ) of 0.71 for nitrate and 0.58 for orthophosphate using 5-fold cross-validation. Our model showed slightly better performance for higher Strahler stream orders, highlighting the challenges of making predictions in small streams. Our results revealed that arable and horticultural land use is the strongest and most reliable predictor for nitrate, while floodplain extents and standard percentage runoff are stronger predictors for orthophosphate. Nationally, higher orthophosphate concentrations were observed in urbanised areas. This study shows how combining a river network model with machine learning can easily provide a river network understanding of the spatial distribution of water quality levels.