Foundation models have become key to Large Language Model (LLM) architectures, leveraging the great corpus of text available on the internet. Advances in transcriptomic foundation models (TFMs) and exponentially increasing data availability are contributing to the same trend in biology. Here the authors describe scFoundation, the largest TFM in literature, having been pretrained on 50 million single-cell transcriptomic profiles and totalling 100 million parameters. A transformer-like asymmetric encoder- decoder architecture was trained on a read-depth aware (RDA) de-masking task. The model has been applied to several downstream tasks, showing that its improved generalization yields better performance across gene, cell, and cell line domains. State-of-the-art performance was shown for read- depth enhancement, drug response prediction, cell type annotation, gene perturbation response prediction, gene module and GRN inference.

Foundation models have become key to Large Language Model (LLM) architectures, leveraging the great corpus of text available on the internet. Advances in transcriptomic foundation models (TFMs) and exponentially increasing data availability are contributing to the same trend in biology. Here the authors describe scFoundation, the largest TFM in literature, having been pretrained on 50 million single-cell transcriptomic profiles and totalling 100 million parameters. A transformer-like asymmetric encoder- decoder architecture was trained on a read-depth aware (RDA) de-masking task. The model has been applied to several downstream tasks, showing that its improved generalization yields better performance across gene, cell, and cell line domains. State-of-the-art performance was shown for read- depth enhancement, drug response prediction, cell type annotation, gene perturbation response prediction, gene module and GRN inference.

Transcriptomic Neural Networks Architecture and Applications to Functional and Aging Research

PINAROLI, ANDREA
2024/2025

Abstract

Foundation models have become key to Large Language Model (LLM) architectures, leveraging the great corpus of text available on the internet. Advances in transcriptomic foundation models (TFMs) and exponentially increasing data availability are contributing to the same trend in biology. Here the authors describe scFoundation, the largest TFM in literature, having been pretrained on 50 million single-cell transcriptomic profiles and totalling 100 million parameters. A transformer-like asymmetric encoder- decoder architecture was trained on a read-depth aware (RDA) de-masking task. The model has been applied to several downstream tasks, showing that its improved generalization yields better performance across gene, cell, and cell line domains. State-of-the-art performance was shown for read- depth enhancement, drug response prediction, cell type annotation, gene perturbation response prediction, gene module and GRN inference.
2024
Transcriptomic Neural Networks Architecture and Applications to Functional and Aging Research
Foundation models have become key to Large Language Model (LLM) architectures, leveraging the great corpus of text available on the internet. Advances in transcriptomic foundation models (TFMs) and exponentially increasing data availability are contributing to the same trend in biology. Here the authors describe scFoundation, the largest TFM in literature, having been pretrained on 50 million single-cell transcriptomic profiles and totalling 100 million parameters. A transformer-like asymmetric encoder- decoder architecture was trained on a read-depth aware (RDA) de-masking task. The model has been applied to several downstream tasks, showing that its improved generalization yields better performance across gene, cell, and cell line domains. State-of-the-art performance was shown for read- depth enhancement, drug response prediction, cell type annotation, gene perturbation response prediction, gene module and GRN inference.
Neural Networks
Aging Clocks
TFMs
File in questo prodotto:
File Dimensione Formato  
Pinaroli_Andrea.pdf

accesso aperto

Dimensione 6.22 MB
Formato Adobe PDF
6.22 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91971