On the use of machine learning approaches for ensemble prediction of the sea state

Numerical Weather Prediction systems, based on deterministic models initialised with the best available atmospheric conditions, have evolved into Ensemble Prediction Systems which provides probabilistic forecasts and quantifies forecast uncertainty by accounting for the chaotic and dynamic nature of the atmosphere and its strong sensitivity to initial conditions. On this behalf, world-leading weather forecasting centres now provide both deterministic forecasts and up to 50 alternative realisations of the global atmospheric forecast. The latter are generated by slightly perturbing the best available initial conditions and the physical parameters of the model. Regional forecasting centres use the forecasted wind field at the sea surface as input to generate high-resolution local sea state (namely, wave) forecasts for operational purposes. However, due to computational constraints, they are often unable to process the full ensemble dataset. For this reason, the original (e.g., 50-member) ensemble may be reduced in size, while striving to retain its most significant characteristics and as much informational content as possible. This thesis explores the use of machine learning techniques to cluster ensemble members with similar characteristics, thereby reducing redundancy and enabling the selection of so-called Representative Ensemble Members (REMs). The entire process was conducted in collaboration with, and under the supervision of, CNR-ISMAR and the DAIS Department of Ca' Foscari University. If effective, the proposed approach could be integrated into the operational workflow of local probabilistic sea state forecasting, supporting the transition of the sea state forecasting system for the Municipality of Venice-developed by CNR-ISMAR and currently in use by ARPA Veneto and OSMER-FVG-towards a probabilistic (ensemble-based) framework (Barbariol et al., 2022). When the clustering algorithm is allowed to determine the number of clusters on its own, we find that applying it directly to the raw data (pure clustering)-our baseline approach-performs well across all selected evaluation metrics, even without any dimensionality reduction during preprocessing. However, PCA yields excellent results when the explained variance is between 70% and 98.5%. Within this interval, it consistently outperforms pure clustering, identifying a higher number of clusters. In this framework, none of the implemented autoencoder architectures provides clear advantages over the baseline technique. When the focus shifts toward investigating the dependency of the methods on specific problem parameters (e.g. the number of clusters and the dimension of the latent space), no single method emerges as clearly superior in all scenarios, nor does any latent space dimension stand out as universally beneficial. The most notable finding is that, for all methods employing dimensionality reduction prior to clustering, when the number of clusters is set between 17 and 20, the ensemble reconstructed using REMs closely matches the original 50-member ensemble. With only a few exceptions-linked to specific methods and latent dimensions, and likely attributable to dataset variability-these reconstructions consistently outperform those obtained by applying the clustering algorithm directly to the data. Given the practical need to reduce the size of the original ensemble, the ideal target is to retain around ten REMs. Although 17 to 20 may still be too many, these findings nonetheless provide a promising starting point. Due to time constraints, for this master's thesis it was only possible to develop the method for selecting the representative wind ensemble members needed to force the wave prediction, without being able to carry out the forecast itself. Recognizing that the research conducted so far remains at a preliminary stage, the next step, already in trial at CNR-ISMAR in Venice, involves employing the selected REMs for sea state forecasting.

On the use of machine learning approaches for ensemble prediction of the sea state

NIERO, CAMILLA

2024/2025

Abstract

Numerical Weather Prediction systems, based on deterministic models initialised with the best available atmospheric conditions, have evolved into Ensemble Prediction Systems which provides probabilistic forecasts and quantifies forecast uncertainty by accounting for the chaotic and dynamic nature of the atmosphere and its strong sensitivity to initial conditions. On this behalf, world-leading weather forecasting centres now provide both deterministic forecasts and up to 50 alternative realisations of the global atmospheric forecast. The latter are generated by slightly perturbing the best available initial conditions and the physical parameters of the model. Regional forecasting centres use the forecasted wind field at the sea surface as input to generate high-resolution local sea state (namely, wave) forecasts for operational purposes. However, due to computational constraints, they are often unable to process the full ensemble dataset. For this reason, the original (e.g., 50-member) ensemble may be reduced in size, while striving to retain its most significant characteristics and as much informational content as possible. This thesis explores the use of machine learning techniques to cluster ensemble members with similar characteristics, thereby reducing redundancy and enabling the selection of so-called Representative Ensemble Members (REMs). The entire process was conducted in collaboration with, and under the supervision of, CNR-ISMAR and the DAIS Department of Ca' Foscari University. If effective, the proposed approach could be integrated into the operational workflow of local probabilistic sea state forecasting, supporting the transition of the sea state forecasting system for the Municipality of Venice-developed by CNR-ISMAR and currently in use by ARPA Veneto and OSMER-FVG-towards a probabilistic (ensemble-based) framework (Barbariol et al., 2022). When the clustering algorithm is allowed to determine the number of clusters on its own, we find that applying it directly to the raw data (pure clustering)-our baseline approach-performs well across all selected evaluation metrics, even without any dimensionality reduction during preprocessing. However, PCA yields excellent results when the explained variance is between 70% and 98.5%. Within this interval, it consistently outperforms pure clustering, identifying a higher number of clusters. In this framework, none of the implemented autoencoder architectures provides clear advantages over the baseline technique. When the focus shifts toward investigating the dependency of the methods on specific problem parameters (e.g. the number of clusters and the dimension of the latent space), no single method emerges as clearly superior in all scenarios, nor does any latent space dimension stand out as universally beneficial. The most notable finding is that, for all methods employing dimensionality reduction prior to clustering, when the number of clusters is set between 17 and 20, the ensemble reconstructed using REMs closely matches the original 50-member ensemble. With only a few exceptions-linked to specific methods and latent dimensions, and likely attributable to dataset variability-these reconstructions consistently outperform those obtained by applying the clustering algorithm directly to the data. Given the practical need to reduce the size of the original ensemble, the ideal target is to retain around ten REMs. Although 17 to 20 may still be too many, these findings nonetheless provide a promising starting point. Due to time constraints, for this master's thesis it was only possible to develop the method for selecting the representative wind ensemble members needed to force the wave prediction, without being able to carry out the forecast itself. Recognizing that the research conducted so far remains at a preliminary stage, the next step, already in trial at CNR-ISMAR in Venice, involves employing the selected REMs for sea state forecasting.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria Civile, Edile e Ambientale - ICEA
			
	Corso di studio
	
				MATHEMATICAL ENGINEERING - INGEGNERIA MATEMATICA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				On the use of machine learning approaches for ensemble prediction of the sea state
			
	Abstract in italiano
	
				Numerical Weather Prediction systems, based on deterministic models initialised with the best available atmospheric conditions, have evolved into Ensemble Prediction Systems which provides probabilistic forecasts and quantifies forecast uncertainty by accounting for the chaotic and dynamic nature of the atmosphere and its strong sensitivity to initial conditions. On this behalf, world-leading weather forecasting centres now provide both deterministic forecasts and up to 50 alternative realisations of the global atmospheric forecast. The latter are generated by slightly perturbing the best available initial conditions and the physical parameters of the model.

Regional forecasting centres use the forecasted wind field at the sea surface as input to generate high-resolution local sea state (namely, wave) forecasts for operational purposes. However, due to computational constraints, they are often unable to process the full ensemble dataset. For this reason, the original (e.g., 50-member) ensemble may be reduced in size, while striving to retain its most significant characteristics and as much informational content as possible.

This thesis explores the use of machine learning techniques to cluster ensemble members with similar characteristics, thereby reducing redundancy and enabling the selection of so-called Representative Ensemble Members (REMs).
The entire process was conducted in collaboration with, and under the supervision of, CNR-ISMAR and the DAIS Department of Ca' Foscari University. If effective, the proposed approach could be integrated into the operational workflow of local probabilistic sea state forecasting, supporting the transition of the sea state forecasting system for the Municipality of Venice-developed by CNR-ISMAR and currently in use by ARPA Veneto and OSMER-FVG-towards a probabilistic (ensemble-based) framework (Barbariol et al., 2022).

When the clustering algorithm is allowed to determine the number of clusters on its own, we find that applying it directly to the raw data (pure clustering)-our baseline approach-performs well across all selected evaluation metrics, even without any dimensionality reduction during preprocessing. However, PCA yields excellent results when the explained variance is between 70% and 98.5%. Within this interval, it consistently outperforms pure clustering, identifying a higher number of clusters. In this framework, none of the implemented autoencoder architectures provides clear advantages over the baseline technique.

When the focus shifts toward investigating the dependency of the methods on specific problem parameters (e.g. the number of clusters and the dimension of the latent space), no single method emerges as clearly superior in all scenarios, nor does any latent space dimension stand out as universally beneficial. The most notable finding is that, for all methods employing dimensionality reduction prior to clustering, when the number of clusters is set between 17 and 20, the ensemble reconstructed using REMs closely matches the original 50-member ensemble. With only a few exceptions-linked to specific methods and latent dimensions, and likely attributable to dataset variability-these reconstructions consistently outperform those obtained by applying the clustering algorithm directly to the data. Given the practical need to reduce the size of the original ensemble, the ideal target is to retain around ten REMs. Although 17 to 20 may still be too many, these findings nonetheless provide a promising starting point.
Due to time constraints, for this master's thesis it was only possible to develop the method for selecting the representative wind ensemble members needed to force the wave prediction, without being able to carry out the forecast itself. Recognizing that the research conducted so far remains at a preliminary stage, the next step, already in trial at CNR-ISMAR in Venice, involves employing the selected REMs for sea state forecasting.
			
	Parola chiave
	
				weather forecast
ensemble prediction
machine learning
wave forecasting
			
	Relatore
	
				MARTINELLI, LUCA
			
	Correlatore
	
				BARBARIOL, FRANCESCO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
master_thesis_niero_camilla_2025.pdf embargo fino al 16/07/2026 Dimensione 14.91 MB Formato Adobe PDF	14.91 MB	Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89167