To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$), we characterise the evolution of fundamental quantities and kernels governing the training dynamics.

To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel\cite{jacotNTK} under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$) \cite{yang2TensProgMUP}, we characterise the evolution of fundamental quantities and kernels governing the training dynamics.

Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks

BRECCIA, ALESSANDRO
2023/2024

Abstract

To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$), we characterise the evolution of fundamental quantities and kernels governing the training dynamics.
2023
Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks
To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel\cite{jacotNTK} under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$) \cite{yang2TensProgMUP}, we characterise the evolution of fundamental quantities and kernels governing the training dynamics.
machine learning
statistical physics
dynamical mean field
overparametrization
deep learning
File in questo prodotto:
File Dimensione Formato  
Breccia_Alessandro.pdf

accesso aperto

Dimensione 20.94 MB
Formato Adobe PDF
20.94 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/70807