Protein language models have become powerful tools for representing protein sequences and supporting a wide range of computational biology tasks. Despite their strong performance, the internal representations learned by these models remain difficult to interpret, especially for intrinsically disordered proteins. This thesis investigates how protein language models encode information related to intrinsically disordered regions and explores whether these representations can be interpreted in a biologically meaningful way. Therefore, sparse autoencoders are applied to intermediate representations extracted from a protein language model. By enforcing sparsity, the autoencoder decomposes high-dimensional model activations into a set of latent features that can be analyzed individually. The study focuses on residue-level representations and examines the relationship between sparse latent activations and biological annotations associated with protein disorder. Statistical analyses and linear probing methods are employed to evaluate whether specific latent features capture signals relevant to intrinsically disordered regions. This work follows an exploratory and interpretability-driven approach. Different model layers and sparse representations are compared in order to better understand how disorder-related information emerges within protein language models. The results provide insights into the structure of learned representations and highlight the potential of sparse autoencoders as a tool for improving interpretability in computational protein analysis. Overall, this thesis contributes to interpretation of representations related to intrinsically disordered proteins bu using different machine learning models and latent features.

Protein language models have become powerful tools for representing protein sequences and supporting a wide range of computational biology tasks. Despite their strong performance, the internal representations learned by these models remain difficult to interpret, especially for intrinsically disordered proteins. This thesis investigates how protein language models encode information related to intrinsically disordered regions and explores whether these representations can be interpreted in a biologically meaningful way. Therefore, sparse autoencoders are applied to intermediate representations extracted from a protein language model. By enforcing sparsity, the autoencoder decomposes high-dimensional model activations into a set of latent features that can be analyzed individually. The study focuses on residue-level representations and examines the relationship between sparse latent activations and biological annotations associated with protein disorder. Statistical analyses and linear probing methods are employed to evaluate whether specific latent features capture signals relevant to intrinsically disordered regions. This work follows an exploratory and interpretability-driven approach. Different model layers and sparse representations are compared in order to better understand how disorder-related information emerges within protein language models. The results provide insights into the structure of learned representations and highlight the potential of sparse autoencoders as a tool for improving interpretability in computational protein analysis. Overall, this thesis contributes to interpretation of representations related to intrinsically disordered proteins bu using different machine learning models and latent features.

Interpreting Disordered Proteins using Sparse Auto-Encoders and Protein Language Models

BACAKSIZ, ONUR
2025/2026

Abstract

Protein language models have become powerful tools for representing protein sequences and supporting a wide range of computational biology tasks. Despite their strong performance, the internal representations learned by these models remain difficult to interpret, especially for intrinsically disordered proteins. This thesis investigates how protein language models encode information related to intrinsically disordered regions and explores whether these representations can be interpreted in a biologically meaningful way. Therefore, sparse autoencoders are applied to intermediate representations extracted from a protein language model. By enforcing sparsity, the autoencoder decomposes high-dimensional model activations into a set of latent features that can be analyzed individually. The study focuses on residue-level representations and examines the relationship between sparse latent activations and biological annotations associated with protein disorder. Statistical analyses and linear probing methods are employed to evaluate whether specific latent features capture signals relevant to intrinsically disordered regions. This work follows an exploratory and interpretability-driven approach. Different model layers and sparse representations are compared in order to better understand how disorder-related information emerges within protein language models. The results provide insights into the structure of learned representations and highlight the potential of sparse autoencoders as a tool for improving interpretability in computational protein analysis. Overall, this thesis contributes to interpretation of representations related to intrinsically disordered proteins bu using different machine learning models and latent features.
2025
Interpreting Disordered Proteins using Sparse Auto-Encoders and Protein Language Models
Protein language models have become powerful tools for representing protein sequences and supporting a wide range of computational biology tasks. Despite their strong performance, the internal representations learned by these models remain difficult to interpret, especially for intrinsically disordered proteins. This thesis investigates how protein language models encode information related to intrinsically disordered regions and explores whether these representations can be interpreted in a biologically meaningful way. Therefore, sparse autoencoders are applied to intermediate representations extracted from a protein language model. By enforcing sparsity, the autoencoder decomposes high-dimensional model activations into a set of latent features that can be analyzed individually. The study focuses on residue-level representations and examines the relationship between sparse latent activations and biological annotations associated with protein disorder. Statistical analyses and linear probing methods are employed to evaluate whether specific latent features capture signals relevant to intrinsically disordered regions. This work follows an exploratory and interpretability-driven approach. Different model layers and sparse representations are compared in order to better understand how disorder-related information emerges within protein language models. The results provide insights into the structure of learned representations and highlight the potential of sparse autoencoders as a tool for improving interpretability in computational protein analysis. Overall, this thesis contributes to interpretation of representations related to intrinsically disordered proteins bu using different machine learning models and latent features.
Sparse Autoencoders
Protein LMs
Protein Disorder
Interpretability
Comp Biology
File in questo prodotto:
File Dimensione Formato  
Bacaksiz_Onur_Thesis.pdf

accesso aperto

Dimensione 2.92 MB
Formato Adobe PDF
2.92 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108224