Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks

To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$), we characterise the evolution of fundamental quantities and kernels governing the training dynamics.

To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel\cite{jacotNTK} under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$) \cite{yang2TensProgMUP}, we characterise the evolution of fundamental quantities and kernels governing the training dynamics.

Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks

BRECCIA, ALESSANDRO

2023/2024

Abstract

To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$), we characterise the evolution of fundamental quantities and kernels governing the training dynamics.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Fisica e Astronomia "Galileo Galilei" - DFA
			
	Corso di studio
	
				PHYSICS OF DATA Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2023
			
	Titolo inglese
	
				Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks
			
	Abstract in italiano
	
				To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel\cite{jacotNTK} under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$) \cite{yang2TensProgMUP}, we characterise the evolution of fundamental quantities and kernels governing the training dynamics.
			
	Parola chiave
	
				machine learning
statistical physics
dynamical mean field
overparametrization
deep learning
			
	Relatore
	
				BAIESI, MARCO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Breccia_Alessandro.pdf accesso aperto Dimensione 20.94 MB Formato Adobe PDF Visualizza/Apri	20.94 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/70807