Physics-Informed Machine Learning for High-Fidelity Synthetic Data Generation

Interest in synthetic data has grown rapidly in recent years. Synthetic data is artificially generated data with the same statistical properties as real-world data. This growth of interest can be attributed, on the one hand, to the increasing demand for large amounts of data to train AI/ML models and, on the other hand, to the recent development of effective methods for generating high-quality synthetic data. For example, generative AI models have demonstrated excellent capabilities in synthesizing complex datasets. Unfortunately, many of the processes of interest are rare events or edge cases. Therefore, the amount of real data that can be used to train generative models is often insufficient, hence limiting their applicability. Furthermore, in the case of processes involving dynamical systems, generative models often fail to capture the underlying laws governing the dynamics, thus resulting in low-fidelity synthetic data. A possible strategy to overcome these limitations is to generate synthetic data using a physics-informed approach, that is, incorporating the knowledge of the governing physical laws into the generative model. This thesis explores a possible approach for generating high-fidelity synthetic data using physics-informed ML. Specifically, the approach investigated in this work uses the SINDy Autoencoder network introduced by Champion et al. as a synthetic data generator. This approach is benchmarked with a commercial tool developed by Clearbox AI, a synthetic data provider. The generative models under study are tested on two datasets generated by nonlinear dynamical systems: a simulation dataset with dynamics defined by the Lorenz system and a real dataset acquired on a full-scale F-16 aircraft. The results of the study show that the explored approach is a rather promising solution for generating high-fidelity synthetic data. However, the training procedure is significantly complicated by the presence of multiple competing loss terms. Moreover, the effectiveness of the approach appears to be strongly dependent on the dataset in use and on the complexity of the corresponding dynamical system.

Physics-Informed Machine Learning for High-Fidelity Synthetic Data Generation

NINNI, DANIELE

2022/2023

Abstract

Interest in synthetic data has grown rapidly in recent years. Synthetic data is artificially generated data with the same statistical properties as real-world data. This growth of interest can be attributed, on the one hand, to the increasing demand for large amounts of data to train AI/ML models and, on the other hand, to the recent development of effective methods for generating high-quality synthetic data. For example, generative AI models have demonstrated excellent capabilities in synthesizing complex datasets. Unfortunately, many of the processes of interest are rare events or edge cases. Therefore, the amount of real data that can be used to train generative models is often insufficient, hence limiting their applicability. Furthermore, in the case of processes involving dynamical systems, generative models often fail to capture the underlying laws governing the dynamics, thus resulting in low-fidelity synthetic data. A possible strategy to overcome these limitations is to generate synthetic data using a physics-informed approach, that is, incorporating the knowledge of the governing physical laws into the generative model. This thesis explores a possible approach for generating high-fidelity synthetic data using physics-informed ML. Specifically, the approach investigated in this work uses the SINDy Autoencoder network introduced by Champion et al. as a synthetic data generator. This approach is benchmarked with a commercial tool developed by Clearbox AI, a synthetic data provider. The generative models under study are tested on two datasets generated by nonlinear dynamical systems: a simulation dataset with dynamics defined by the Lorenz system and a real dataset acquired on a full-scale F-16 aircraft. The results of the study show that the explored approach is a rather promising solution for generating high-fidelity synthetic data. However, the training procedure is significantly complicated by the presence of multiple competing loss terms. Moreover, the effectiveness of the approach appears to be strongly dependent on the dataset in use and on the complexity of the corresponding dynamical system.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Fisica e Astronomia "Galileo Galilei" - DFA
			
	Corso di studio
	
				PHYSICS OF DATA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2022
			
	Titolo inglese
	
				Physics-Informed Machine Learning for High-Fidelity Synthetic Data Generation
			
	Parola chiave
	
				physics-informed
machine learning
synthetic data
			
	Relatore
	
				ZANETTI, MARCO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
ninni_daniele.pdf accesso aperto Dimensione 23.48 MB Formato Adobe PDF Visualizza/Apri	23.48 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/47364