This thesis begins with an in-depth analysis of the Transformer architecture, which has become the foundation for a wide range of sequence modeling tasks due to its powerful attention mechanism. Despite its success, recent studies have highlighted certain limitations of attention, particularly in terms of computational efficiency and scalability for long sequences. To address these issues, this work explores an alternative class of models based on State Space Models (SSMs). In particular, the S6 model—a recently proposed SSM—was studied within the context of the MAMBA architecture, which leverages the strengths of state space formulations while aiming to match or exceed the efficiency of Transformers. Following a thorough analysis of S6 and its integration into MAMBA, a novel SSM-based model is introduced. This new model, developed as part of this thesis, demonstrates improved performance over S6 in specific application scenarios. The thesis also features an extensive experimental section, where S6, the proposed model, and other baseline architectures are compared across different benchmarks, with a focus on accuracy, efficiency, and generalization capabilities.

This thesis begins with an in-depth analysis of the Transformer architecture, which has become the foundation for a wide range of sequence modeling tasks due to its powerful attention mechanism. Despite its success, recent studies have highlighted certain limitations of attention, particularly in terms of computational efficiency and scalability for long sequences. To address these issues, this work explores an alternative class of models based on State Space Models (SSMs). In particular, the S6 model—a recently proposed SSM—was studied within the context of the MAMBA architecture, which leverages the strengths of state space formulations while aiming to match or exceed the efficiency of Transformers. Following a thorough analysis of S6 and its integration into MAMBA, a novel SSM-based model is introduced. This new model, developed as part of this thesis, demonstrates improved performance over S6 in specific application scenarios. The thesis also features an extensive experimental section, where S6, the proposed model, and other baseline architectures are compared across different benchmarks, with a focus on accuracy, efficiency, and generalization capabilities.

A system-theoretic perspective on Transformers

ZATTRA, RICCARDO
2024/2025

Abstract

This thesis begins with an in-depth analysis of the Transformer architecture, which has become the foundation for a wide range of sequence modeling tasks due to its powerful attention mechanism. Despite its success, recent studies have highlighted certain limitations of attention, particularly in terms of computational efficiency and scalability for long sequences. To address these issues, this work explores an alternative class of models based on State Space Models (SSMs). In particular, the S6 model—a recently proposed SSM—was studied within the context of the MAMBA architecture, which leverages the strengths of state space formulations while aiming to match or exceed the efficiency of Transformers. Following a thorough analysis of S6 and its integration into MAMBA, a novel SSM-based model is introduced. This new model, developed as part of this thesis, demonstrates improved performance over S6 in specific application scenarios. The thesis also features an extensive experimental section, where S6, the proposed model, and other baseline architectures are compared across different benchmarks, with a focus on accuracy, efficiency, and generalization capabilities.
2024
A system-theoretic perspective on Transformers
This thesis begins with an in-depth analysis of the Transformer architecture, which has become the foundation for a wide range of sequence modeling tasks due to its powerful attention mechanism. Despite its success, recent studies have highlighted certain limitations of attention, particularly in terms of computational efficiency and scalability for long sequences. To address these issues, this work explores an alternative class of models based on State Space Models (SSMs). In particular, the S6 model—a recently proposed SSM—was studied within the context of the MAMBA architecture, which leverages the strengths of state space formulations while aiming to match or exceed the efficiency of Transformers. Following a thorough analysis of S6 and its integration into MAMBA, a novel SSM-based model is introduced. This new model, developed as part of this thesis, demonstrates improved performance over S6 in specific application scenarios. The thesis also features an extensive experimental section, where S6, the proposed model, and other baseline architectures are compared across different benchmarks, with a focus on accuracy, efficiency, and generalization capabilities.
Transformer
State space models
Generative AI
File in questo prodotto:
File Dimensione Formato  
Zattra_Riccardo.pdf

embargo fino al 14/03/2027

Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/90729