To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$), we characterise the evolution of fundamental quantities and kernels governing the training dynamics.
To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel\cite{jacotNTK} under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$) \cite{yang2TensProgMUP}, we characterise the evolution of fundamental quantities and kernels governing the training dynamics.
Continual Learning: Theoretical and Empirical analysis of Infinitely Wide Neural Networks
BRECCIA, ALESSANDRO
2023/2024
Abstract
To handle real-world dynamics, an intelligent system must continuously gather, update, accumulate, and utilise knowledge throughout its existence. This capability, termed continual learning, is essential for AI systems to adapt and evolve over time and to avoid re-training models from scratch in order to perform updates. However, a significant challenge in continual learning is catastrophic forgetting, where acquiring new knowledge often leads to a substantial decline in performance on previously learned tasks. The aim of this work is looking at the origin of the catastrophic forgetting phenomenon under the lens of infinite wide ('overparametrized') neural networks: thanks to special properties emerging from this architectural choice, it is possible to obtain a deeper knowledge of the training dynamics. Thanks to the analysis of the Neural Tangent Kernel under two different network parametrisation, Neural Tangent Parametrisation(NTP) and Maximal Update Parametrisation($\mu P$), we characterise the evolution of fundamental quantities and kernels governing the training dynamics.File | Dimensione | Formato | |
---|---|---|---|
Breccia_Alessandro.pdf
accesso aperto
Dimensione
20.94 MB
Formato
Adobe PDF
|
20.94 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/70807