Protein language models have become powerful tools for representing protein sequences and supporting a wide range of computational biology tasks. Despite their strong performance, the internal representations learned by these models remain difficult to interpret, especially for intrinsically disordered proteins. This thesis investigates how protein language models encode information related to intrinsically disordered regions and explores whether these representations can be interpreted in a biologically meaningful way. Therefore, sparse autoencoders are applied to intermediate representations extracted from a protein language model. By enforcing sparsity, the autoencoder decomposes high-dimensional model activations into a set of latent features that can be analyzed individually. The study focuses on residue-level representations and examines the relationship between sparse latent activations and biological annotations associated with protein disorder. Statistical analyses and linear probing methods are employed to evaluate whether specific latent features capture signals relevant to intrinsically disordered regions. This work follows an exploratory and interpretability-driven approach. Different model layers and sparse representations are compared in order to better understand how disorder-related information emerges within protein language models. The results provide insights into the structure of learned representations and highlight the potential of sparse autoencoders as a tool for improving interpretability in computational protein analysis. Overall, this thesis contributes to interpretation of representations related to intrinsically disordered proteins bu using different machine learning models and latent features.
Protein language models have become powerful tools for representing protein sequences and supporting a wide range of computational biology tasks. Despite their strong performance, the internal representations learned by these models remain difficult to interpret, especially for intrinsically disordered proteins. This thesis investigates how protein language models encode information related to intrinsically disordered regions and explores whether these representations can be interpreted in a biologically meaningful way. Therefore, sparse autoencoders are applied to intermediate representations extracted from a protein language model. By enforcing sparsity, the autoencoder decomposes high-dimensional model activations into a set of latent features that can be analyzed individually. The study focuses on residue-level representations and examines the relationship between sparse latent activations and biological annotations associated with protein disorder. Statistical analyses and linear probing methods are employed to evaluate whether specific latent features capture signals relevant to intrinsically disordered regions. This work follows an exploratory and interpretability-driven approach. Different model layers and sparse representations are compared in order to better understand how disorder-related information emerges within protein language models. The results provide insights into the structure of learned representations and highlight the potential of sparse autoencoders as a tool for improving interpretability in computational protein analysis. Overall, this thesis contributes to interpretation of representations related to intrinsically disordered proteins bu using different machine learning models and latent features.
Interpreting Disordered Proteins using Sparse Auto-Encoders and Protein Language Models
BACAKSIZ, ONUR
2025/2026
Abstract
Protein language models have become powerful tools for representing protein sequences and supporting a wide range of computational biology tasks. Despite their strong performance, the internal representations learned by these models remain difficult to interpret, especially for intrinsically disordered proteins. This thesis investigates how protein language models encode information related to intrinsically disordered regions and explores whether these representations can be interpreted in a biologically meaningful way. Therefore, sparse autoencoders are applied to intermediate representations extracted from a protein language model. By enforcing sparsity, the autoencoder decomposes high-dimensional model activations into a set of latent features that can be analyzed individually. The study focuses on residue-level representations and examines the relationship between sparse latent activations and biological annotations associated with protein disorder. Statistical analyses and linear probing methods are employed to evaluate whether specific latent features capture signals relevant to intrinsically disordered regions. This work follows an exploratory and interpretability-driven approach. Different model layers and sparse representations are compared in order to better understand how disorder-related information emerges within protein language models. The results provide insights into the structure of learned representations and highlight the potential of sparse autoencoders as a tool for improving interpretability in computational protein analysis. Overall, this thesis contributes to interpretation of representations related to intrinsically disordered proteins bu using different machine learning models and latent features.| File | Dimensione | Formato | |
|---|---|---|---|
|
Bacaksiz_Onur_Thesis.pdf
accesso aperto
Dimensione
2.92 MB
Formato
Adobe PDF
|
2.92 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/108224