Foundation models have become key to Large Language Model (LLM) architectures, leveraging the great corpus of text available on the internet. Advances in transcriptomic foundation models (TFMs) and exponentially increasing data availability are contributing to the same trend in biology. Here the authors describe scFoundation, the largest TFM in literature, having been pretrained on 50 million single-cell transcriptomic profiles and totalling 100 million parameters. A transformer-like asymmetric encoder- decoder architecture was trained on a read-depth aware (RDA) de-masking task. The model has been applied to several downstream tasks, showing that its improved generalization yields better performance across gene, cell, and cell line domains. State-of-the-art performance was shown for read- depth enhancement, drug response prediction, cell type annotation, gene perturbation response prediction, gene module and GRN inference.
Foundation models have become key to Large Language Model (LLM) architectures, leveraging the great corpus of text available on the internet. Advances in transcriptomic foundation models (TFMs) and exponentially increasing data availability are contributing to the same trend in biology. Here the authors describe scFoundation, the largest TFM in literature, having been pretrained on 50 million single-cell transcriptomic profiles and totalling 100 million parameters. A transformer-like asymmetric encoder- decoder architecture was trained on a read-depth aware (RDA) de-masking task. The model has been applied to several downstream tasks, showing that its improved generalization yields better performance across gene, cell, and cell line domains. State-of-the-art performance was shown for read- depth enhancement, drug response prediction, cell type annotation, gene perturbation response prediction, gene module and GRN inference.
Transcriptomic Neural Networks Architecture and Applications to Functional and Aging Research
PINAROLI, ANDREA
2024/2025
Abstract
Foundation models have become key to Large Language Model (LLM) architectures, leveraging the great corpus of text available on the internet. Advances in transcriptomic foundation models (TFMs) and exponentially increasing data availability are contributing to the same trend in biology. Here the authors describe scFoundation, the largest TFM in literature, having been pretrained on 50 million single-cell transcriptomic profiles and totalling 100 million parameters. A transformer-like asymmetric encoder- decoder architecture was trained on a read-depth aware (RDA) de-masking task. The model has been applied to several downstream tasks, showing that its improved generalization yields better performance across gene, cell, and cell line domains. State-of-the-art performance was shown for read- depth enhancement, drug response prediction, cell type annotation, gene perturbation response prediction, gene module and GRN inference.| File | Dimensione | Formato | |
|---|---|---|---|
|
Pinaroli_Andrea.pdf
accesso aperto
Dimensione
6.22 MB
Formato
Adobe PDF
|
6.22 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91971