Beyond point predictions: Quantifying uncertainty in E. coli ML-based monitoring

Abert-Fernández, David; Aguilera, Ester; Emiliano, Pere; Valero, Fernando; Monclús Sales, Hèctor

Beyond point predictions: Quantifying uncertainty in E. coli ML-based monitoring

Abert-Fernández, David

Aguilera, Ester

Emiliano, Pere

Valero, Fernando

orcId Monclús Sales, Hèctor scopusId Monclús Sales, Hèctor

Monclús Sales, Hèctor

2025-10

Text Complet

1-s2.0-S2214714425018070-main.pdf 5.012 Mb | PDF

Machine learning regression models are increasingly used to improve management, decision-making, and monitoring of drinking water quality, leveraging growing data from real-time sensors and laboratory analyses. However, most models provide only point predictions, ignoring inherent uncertainty caused by unobserved factors that can produce varying outcomes under similar conditions. This study benchmarks state-of-the-art regression algorithms and uncertainty quantification methods for predicting E. coli concentrations in a drinking water catchment. Gradient-boosted decision trees (GBDT) proved effective for real-time tracking, with CatBoost achieving the lowest error (RMSLE = 0.877), improving on the naïve baseline (1.160) and outperforming Random Forest by 5 %. Uncertainty quantification techniques successfully generated valid prediction intervals to identify high-risk contamination events, with Conformalized Quantile Regression emerging as the most reliable method. By combining accurate GBDT predictions with well-calibrated uncertainty estimates, this approach enhances microbial water quality forecasting, offering improved risk assessment and supporting more robust decision-making in drinking water management

Aquest document està subjecte a una llicència Creative Commons:Reconeixement - No comercial (by-nc)

Mostra el registre complet de l'element

Identificadors

http://hdl.handle.net/10256/27323

doi: 10.1016/j.jwpe.2025.108734

eissn: 2214-7144

Text Complet

1-s2.0-S2214714425018070-main.pdf 5.012 Mb | PDF

Projectes

Nom: DESARROLLO DE UNA METODOLOGIA PARA UNA GESTION RESILIENTE EN LOS SISTEMAS DE TRATAMIENTO DE AGUA POTABLE. DE LA INVESTIGACION APLICADA A LA VALIDACION A ESCALA REAL

Identificador: PID2020-112615RA-I00

Acrònim:

Programa: Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020

Compartir

Impacte

3

Veure estadístiques d'ús

Citat vegades a Scopus

Citat vegades a Web of Science

H-index d'aquesta revista:

Índex Scimago de 1971:

Google Acadèmic