The increasingly widespread use of low-cost sensors (LCS) for air pollution detection not only serves as an effective tool for supporting a uniformly distributed monitoring network across a given territory but also contributes to raising community awareness about environmental issues and helps implement solutions to safeguard the health of the entire ecosystem. However, unlike regulatory-grade instruments (e.g., ARPA), there are no universally established quality standards for LCS. The type of sensor itself significantly influences the quality of recorded data, which tends to vary widely. This underscores the need for a thorough evaluation of raw measurements and their subsequent accurate recalibration before any responsible use of such data. This study presents the analysis and field calibration of fine particulate matter (PM 2.5) detected by a low-cost monitoring network currently operational in the Emilia-Romagna region. The analysis of these data, compared with ARPA-certified averages through a co-location process of the monitoring stations, revealed significant variability and a substantial overestimation of pollutant levels by the LCS sensors. Universal laboratory corrections and analytical corrective formulas can reduce distortion in sensor readings but only partially address the high variability, as they fail to account for many influencing factors, such as atmospheric conditions and sensor type. Ten months of readings from LoRaWAN environmental sensors were used to train machine learning models to develop a reliable calibration system. This system not only corrects sensor readings based on certified ArpaE data but also identifies anomalies and unusual peaks associated with specific stations. The predictions generated by these statistical algorithms demonstrated substantial improvements in data quality compared to raw data or laboratory corrections. The applied calibration methodology proved robust across contexts, as the results closely followed the trends of certified data and effectively reflected seasonal variations. Supervised machine learning models used for calibrating daily average readings showed good agreement with certified reference data, achieving an absolute error of approximately 2.5 over 24 hours. Unsupervised learning and clustering techniques allowed for a deeper analysis of PM 2.5 concentrations monitored across the entire low-cost network. This provided significant insights into the spatiotemporal variations of PM 2.5 in the Emilia-Romagna region. The anomaly detection process added value to the understanding of domain conditions near each installed station, revealing the varying pollutant patterns in relation to climate and seasonal changes, as well as the impact of human activities or external factors contributing to increased emissions.

The increasingly widespread use of low-cost sensors (LCS) for air pollution detection not only serves as an effective tool for supporting a uniformly distributed monitoring network across a given territory but also contributes to raising community awareness about environmental issues and helps implement solutions to safeguard the health of the entire ecosystem. However, unlike regulatory-grade instruments (e.g., ARPA), there are no universally established quality standards for LCS. The type of sensor itself significantly influences the quality of recorded data, which tends to vary widely. This underscores the need for a thorough evaluation of raw measurements and their subsequent accurate recalibration before any responsible use of such data. This study presents the analysis and field calibration of fine particulate matter (PM 2.5) detected by a low-cost monitoring network currently operational in the Emilia-Romagna region. The analysis of these data, compared with ARPA-certified averages through a co-location process of the monitoring stations, revealed significant variability and a substantial overestimation of pollutant levels by the LCS sensors. Universal laboratory corrections and analytical corrective formulas can reduce distortion in sensor readings but only partially address the high variability, as they fail to account for many influencing factors, such as atmospheric conditions and sensor type. Ten months of readings from LoRaWAN environmental sensors were used to train machine learning models to develop a reliable calibration system. This system not only corrects sensor readings based on certified ArpaE data but also identifies anomalies and unusual peaks associated with specific stations. The predictions generated by these statistical algorithms demonstrated substantial improvements in data quality compared to raw data or laboratory corrections. The applied calibration methodology proved robust across contexts, as the results closely followed the trends of certified data and effectively reflected seasonal variations. Supervised machine learning models used for calibrating daily average readings showed good agreement with certified reference data, achieving an absolute error of approximately 2.5 over 24 hours. Unsupervised learning and clustering techniques allowed for a deeper analysis of PM 2.5 concentrations monitored across the entire low-cost network. This provided significant insights into the spatiotemporal variations of PM 2.5 in the Emilia-Romagna region. The anomaly detection process added value to the understanding of domain conditions near each installed station, revealing the varying pollutant patterns in relation to climate and seasonal changes, as well as the impact of human activities or external factors contributing to increased emissions.

Correction of Low-Cost Sensor measurements and Anomaly Detection in Air Quality Data: training of supervised and unsupervised models for PM2.5 analysis

NANNI, SARA
2024/2025

Abstract

The increasingly widespread use of low-cost sensors (LCS) for air pollution detection not only serves as an effective tool for supporting a uniformly distributed monitoring network across a given territory but also contributes to raising community awareness about environmental issues and helps implement solutions to safeguard the health of the entire ecosystem. However, unlike regulatory-grade instruments (e.g., ARPA), there are no universally established quality standards for LCS. The type of sensor itself significantly influences the quality of recorded data, which tends to vary widely. This underscores the need for a thorough evaluation of raw measurements and their subsequent accurate recalibration before any responsible use of such data. This study presents the analysis and field calibration of fine particulate matter (PM 2.5) detected by a low-cost monitoring network currently operational in the Emilia-Romagna region. The analysis of these data, compared with ARPA-certified averages through a co-location process of the monitoring stations, revealed significant variability and a substantial overestimation of pollutant levels by the LCS sensors. Universal laboratory corrections and analytical corrective formulas can reduce distortion in sensor readings but only partially address the high variability, as they fail to account for many influencing factors, such as atmospheric conditions and sensor type. Ten months of readings from LoRaWAN environmental sensors were used to train machine learning models to develop a reliable calibration system. This system not only corrects sensor readings based on certified ArpaE data but also identifies anomalies and unusual peaks associated with specific stations. The predictions generated by these statistical algorithms demonstrated substantial improvements in data quality compared to raw data or laboratory corrections. The applied calibration methodology proved robust across contexts, as the results closely followed the trends of certified data and effectively reflected seasonal variations. Supervised machine learning models used for calibrating daily average readings showed good agreement with certified reference data, achieving an absolute error of approximately 2.5 over 24 hours. Unsupervised learning and clustering techniques allowed for a deeper analysis of PM 2.5 concentrations monitored across the entire low-cost network. This provided significant insights into the spatiotemporal variations of PM 2.5 in the Emilia-Romagna region. The anomaly detection process added value to the understanding of domain conditions near each installed station, revealing the varying pollutant patterns in relation to climate and seasonal changes, as well as the impact of human activities or external factors contributing to increased emissions.
2024
Correction of Low-Cost Sensor measurements and Anomaly Detection in Air Quality Data: training of supervised and unsupervised models for PM2.5 analysis
The increasingly widespread use of low-cost sensors (LCS) for air pollution detection not only serves as an effective tool for supporting a uniformly distributed monitoring network across a given territory but also contributes to raising community awareness about environmental issues and helps implement solutions to safeguard the health of the entire ecosystem. However, unlike regulatory-grade instruments (e.g., ARPA), there are no universally established quality standards for LCS. The type of sensor itself significantly influences the quality of recorded data, which tends to vary widely. This underscores the need for a thorough evaluation of raw measurements and their subsequent accurate recalibration before any responsible use of such data. This study presents the analysis and field calibration of fine particulate matter (PM 2.5) detected by a low-cost monitoring network currently operational in the Emilia-Romagna region. The analysis of these data, compared with ARPA-certified averages through a co-location process of the monitoring stations, revealed significant variability and a substantial overestimation of pollutant levels by the LCS sensors. Universal laboratory corrections and analytical corrective formulas can reduce distortion in sensor readings but only partially address the high variability, as they fail to account for many influencing factors, such as atmospheric conditions and sensor type. Ten months of readings from LoRaWAN environmental sensors were used to train machine learning models to develop a reliable calibration system. This system not only corrects sensor readings based on certified ArpaE data but also identifies anomalies and unusual peaks associated with specific stations. The predictions generated by these statistical algorithms demonstrated substantial improvements in data quality compared to raw data or laboratory corrections. The applied calibration methodology proved robust across contexts, as the results closely followed the trends of certified data and effectively reflected seasonal variations. Supervised machine learning models used for calibrating daily average readings showed good agreement with certified reference data, achieving an absolute error of approximately 2.5 over 24 hours. Unsupervised learning and clustering techniques allowed for a deeper analysis of PM 2.5 concentrations monitored across the entire low-cost network. This provided significant insights into the spatiotemporal variations of PM 2.5 in the Emilia-Romagna region. The anomaly detection process added value to the understanding of domain conditions near each installed station, revealing the varying pollutant patterns in relation to climate and seasonal changes, as well as the impact of human activities or external factors contributing to increased emissions.
Air Quality Data
Anomaly Detection
Data Correction
Machine learning
Data Analysis
File in questo prodotto:
File Dimensione Formato  
Nanni_Sara.pdf

accesso aperto

Dimensione 4.94 MB
Formato Adobe PDF
4.94 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/81806