Numerical Weather Prediction systems, based on deterministic models initialised with the best available atmospheric conditions, have evolved into Ensemble Prediction Systems which provides probabilistic forecasts and quantifies forecast uncertainty by accounting for the chaotic and dynamic nature of the atmosphere and its strong sensitivity to initial conditions. On this behalf, world-leading weather forecasting centres now provide both deterministic forecasts and up to 50 alternative realisations of the global atmospheric forecast. The latter are generated by slightly perturbing the best available initial conditions and the physical parameters of the model. Regional forecasting centres use the forecasted wind field at the sea surface as input to generate high-resolution local sea state (namely, wave) forecasts for operational purposes. However, due to computational constraints, they are often unable to process the full ensemble dataset. For this reason, the original (e.g., 50-member) ensemble may be reduced in size, while striving to retain its most significant characteristics and as much informational content as possible. This thesis explores the use of machine learning techniques to cluster ensemble members with similar characteristics, thereby reducing redundancy and enabling the selection of so-called Representative Ensemble Members (REMs). The entire process was conducted in collaboration with, and under the supervision of, CNR-ISMAR and the DAIS Department of Ca' Foscari University. If effective, the proposed approach could be integrated into the operational workflow of local probabilistic sea state forecasting, supporting the transition of the sea state forecasting system for the Municipality of Venice-developed by CNR-ISMAR and currently in use by ARPA Veneto and OSMER-FVG-towards a probabilistic (ensemble-based) framework (Barbariol et al., 2022). When the clustering algorithm is allowed to determine the number of clusters on its own, we find that applying it directly to the raw data (pure clustering)-our baseline approach-performs well across all selected evaluation metrics, even without any dimensionality reduction during preprocessing. However, PCA yields excellent results when the explained variance is between 70% and 98.5%. Within this interval, it consistently outperforms pure clustering, identifying a higher number of clusters. In this framework, none of the implemented autoencoder architectures provides clear advantages over the baseline technique. When the focus shifts toward investigating the dependency of the methods on specific problem parameters (e.g. the number of clusters and the dimension of the latent space), no single method emerges as clearly superior in all scenarios, nor does any latent space dimension stand out as universally beneficial. The most notable finding is that, for all methods employing dimensionality reduction prior to clustering, when the number of clusters is set between 17 and 20, the ensemble reconstructed using REMs closely matches the original 50-member ensemble. With only a few exceptions-linked to specific methods and latent dimensions, and likely attributable to dataset variability-these reconstructions consistently outperform those obtained by applying the clustering algorithm directly to the data. Given the practical need to reduce the size of the original ensemble, the ideal target is to retain around ten REMs. Although 17 to 20 may still be too many, these findings nonetheless provide a promising starting point. Due to time constraints, for this master's thesis it was only possible to develop the method for selecting the representative wind ensemble members needed to force the wave prediction, without being able to carry out the forecast itself. Recognizing that the research conducted so far remains at a preliminary stage, the next step, already in trial at CNR-ISMAR in Venice, involves employing the selected REMs for sea state forecasting.

Numerical Weather Prediction systems, based on deterministic models initialised with the best available atmospheric conditions, have evolved into Ensemble Prediction Systems which provides probabilistic forecasts and quantifies forecast uncertainty by accounting for the chaotic and dynamic nature of the atmosphere and its strong sensitivity to initial conditions. On this behalf, world-leading weather forecasting centres now provide both deterministic forecasts and up to 50 alternative realisations of the global atmospheric forecast. The latter are generated by slightly perturbing the best available initial conditions and the physical parameters of the model. Regional forecasting centres use the forecasted wind field at the sea surface as input to generate high-resolution local sea state (namely, wave) forecasts for operational purposes. However, due to computational constraints, they are often unable to process the full ensemble dataset. For this reason, the original (e.g., 50-member) ensemble may be reduced in size, while striving to retain its most significant characteristics and as much informational content as possible. This thesis explores the use of machine learning techniques to cluster ensemble members with similar characteristics, thereby reducing redundancy and enabling the selection of so-called Representative Ensemble Members (REMs). The entire process was conducted in collaboration with, and under the supervision of, CNR-ISMAR and the DAIS Department of Ca' Foscari University. If effective, the proposed approach could be integrated into the operational workflow of local probabilistic sea state forecasting, supporting the transition of the sea state forecasting system for the Municipality of Venice-developed by CNR-ISMAR and currently in use by ARPA Veneto and OSMER-FVG-towards a probabilistic (ensemble-based) framework (Barbariol et al., 2022). When the clustering algorithm is allowed to determine the number of clusters on its own, we find that applying it directly to the raw data (pure clustering)-our baseline approach-performs well across all selected evaluation metrics, even without any dimensionality reduction during preprocessing. However, PCA yields excellent results when the explained variance is between 70% and 98.5%. Within this interval, it consistently outperforms pure clustering, identifying a higher number of clusters. In this framework, none of the implemented autoencoder architectures provides clear advantages over the baseline technique. When the focus shifts toward investigating the dependency of the methods on specific problem parameters (e.g. the number of clusters and the dimension of the latent space), no single method emerges as clearly superior in all scenarios, nor does any latent space dimension stand out as universally beneficial. The most notable finding is that, for all methods employing dimensionality reduction prior to clustering, when the number of clusters is set between 17 and 20, the ensemble reconstructed using REMs closely matches the original 50-member ensemble. With only a few exceptions-linked to specific methods and latent dimensions, and likely attributable to dataset variability-these reconstructions consistently outperform those obtained by applying the clustering algorithm directly to the data. Given the practical need to reduce the size of the original ensemble, the ideal target is to retain around ten REMs. Although 17 to 20 may still be too many, these findings nonetheless provide a promising starting point. Due to time constraints, for this master's thesis it was only possible to develop the method for selecting the representative wind ensemble members needed to force the wave prediction, without being able to carry out the forecast itself. Recognizing that the research conducted so far remains at a preliminary stage, the next step, already in trial at CNR-ISMAR in Venice, involves employing the selected REMs for sea state forecasting.

On the use of machine learning approaches for ensemble prediction of the sea state

NIERO, CAMILLA
2024/2025

Abstract

Numerical Weather Prediction systems, based on deterministic models initialised with the best available atmospheric conditions, have evolved into Ensemble Prediction Systems which provides probabilistic forecasts and quantifies forecast uncertainty by accounting for the chaotic and dynamic nature of the atmosphere and its strong sensitivity to initial conditions. On this behalf, world-leading weather forecasting centres now provide both deterministic forecasts and up to 50 alternative realisations of the global atmospheric forecast. The latter are generated by slightly perturbing the best available initial conditions and the physical parameters of the model. Regional forecasting centres use the forecasted wind field at the sea surface as input to generate high-resolution local sea state (namely, wave) forecasts for operational purposes. However, due to computational constraints, they are often unable to process the full ensemble dataset. For this reason, the original (e.g., 50-member) ensemble may be reduced in size, while striving to retain its most significant characteristics and as much informational content as possible. This thesis explores the use of machine learning techniques to cluster ensemble members with similar characteristics, thereby reducing redundancy and enabling the selection of so-called Representative Ensemble Members (REMs). The entire process was conducted in collaboration with, and under the supervision of, CNR-ISMAR and the DAIS Department of Ca' Foscari University. If effective, the proposed approach could be integrated into the operational workflow of local probabilistic sea state forecasting, supporting the transition of the sea state forecasting system for the Municipality of Venice-developed by CNR-ISMAR and currently in use by ARPA Veneto and OSMER-FVG-towards a probabilistic (ensemble-based) framework (Barbariol et al., 2022). When the clustering algorithm is allowed to determine the number of clusters on its own, we find that applying it directly to the raw data (pure clustering)-our baseline approach-performs well across all selected evaluation metrics, even without any dimensionality reduction during preprocessing. However, PCA yields excellent results when the explained variance is between 70% and 98.5%. Within this interval, it consistently outperforms pure clustering, identifying a higher number of clusters. In this framework, none of the implemented autoencoder architectures provides clear advantages over the baseline technique. When the focus shifts toward investigating the dependency of the methods on specific problem parameters (e.g. the number of clusters and the dimension of the latent space), no single method emerges as clearly superior in all scenarios, nor does any latent space dimension stand out as universally beneficial. The most notable finding is that, for all methods employing dimensionality reduction prior to clustering, when the number of clusters is set between 17 and 20, the ensemble reconstructed using REMs closely matches the original 50-member ensemble. With only a few exceptions-linked to specific methods and latent dimensions, and likely attributable to dataset variability-these reconstructions consistently outperform those obtained by applying the clustering algorithm directly to the data. Given the practical need to reduce the size of the original ensemble, the ideal target is to retain around ten REMs. Although 17 to 20 may still be too many, these findings nonetheless provide a promising starting point. Due to time constraints, for this master's thesis it was only possible to develop the method for selecting the representative wind ensemble members needed to force the wave prediction, without being able to carry out the forecast itself. Recognizing that the research conducted so far remains at a preliminary stage, the next step, already in trial at CNR-ISMAR in Venice, involves employing the selected REMs for sea state forecasting.
2024
On the use of machine learning approaches for ensemble prediction of the sea state
Numerical Weather Prediction systems, based on deterministic models initialised with the best available atmospheric conditions, have evolved into Ensemble Prediction Systems which provides probabilistic forecasts and quantifies forecast uncertainty by accounting for the chaotic and dynamic nature of the atmosphere and its strong sensitivity to initial conditions. On this behalf, world-leading weather forecasting centres now provide both deterministic forecasts and up to 50 alternative realisations of the global atmospheric forecast. The latter are generated by slightly perturbing the best available initial conditions and the physical parameters of the model. Regional forecasting centres use the forecasted wind field at the sea surface as input to generate high-resolution local sea state (namely, wave) forecasts for operational purposes. However, due to computational constraints, they are often unable to process the full ensemble dataset. For this reason, the original (e.g., 50-member) ensemble may be reduced in size, while striving to retain its most significant characteristics and as much informational content as possible. This thesis explores the use of machine learning techniques to cluster ensemble members with similar characteristics, thereby reducing redundancy and enabling the selection of so-called Representative Ensemble Members (REMs). The entire process was conducted in collaboration with, and under the supervision of, CNR-ISMAR and the DAIS Department of Ca' Foscari University. If effective, the proposed approach could be integrated into the operational workflow of local probabilistic sea state forecasting, supporting the transition of the sea state forecasting system for the Municipality of Venice-developed by CNR-ISMAR and currently in use by ARPA Veneto and OSMER-FVG-towards a probabilistic (ensemble-based) framework (Barbariol et al., 2022). When the clustering algorithm is allowed to determine the number of clusters on its own, we find that applying it directly to the raw data (pure clustering)-our baseline approach-performs well across all selected evaluation metrics, even without any dimensionality reduction during preprocessing. However, PCA yields excellent results when the explained variance is between 70% and 98.5%. Within this interval, it consistently outperforms pure clustering, identifying a higher number of clusters. In this framework, none of the implemented autoencoder architectures provides clear advantages over the baseline technique. When the focus shifts toward investigating the dependency of the methods on specific problem parameters (e.g. the number of clusters and the dimension of the latent space), no single method emerges as clearly superior in all scenarios, nor does any latent space dimension stand out as universally beneficial. The most notable finding is that, for all methods employing dimensionality reduction prior to clustering, when the number of clusters is set between 17 and 20, the ensemble reconstructed using REMs closely matches the original 50-member ensemble. With only a few exceptions-linked to specific methods and latent dimensions, and likely attributable to dataset variability-these reconstructions consistently outperform those obtained by applying the clustering algorithm directly to the data. Given the practical need to reduce the size of the original ensemble, the ideal target is to retain around ten REMs. Although 17 to 20 may still be too many, these findings nonetheless provide a promising starting point. Due to time constraints, for this master's thesis it was only possible to develop the method for selecting the representative wind ensemble members needed to force the wave prediction, without being able to carry out the forecast itself. Recognizing that the research conducted so far remains at a preliminary stage, the next step, already in trial at CNR-ISMAR in Venice, involves employing the selected REMs for sea state forecasting.
weather forecast
ensemble prediction
machine learning
wave forecasting
File in questo prodotto:
File Dimensione Formato  
master_thesis_niero_camilla_2025.pdf

embargo fino al 16/07/2026

Dimensione 14.91 MB
Formato Adobe PDF
14.91 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89167