The widespread accumulation of micro- and nano-plastics in the environment represents an escalating ecological threat, necessitating innovative strategies for mitigation. Microbial plastic degradation offers a promising solution; however, the discovery of relevant enzymes remains challenging due to limited annotation of emerging or latent degradation capabilities. A promising avenue to address this challenge lies in the growing availability of biological data, driven especially by the rise of environmental metagenomics. This wealth of data offers unprecedented opportunities to explore microbial diversity and discover novel plastic-degrading functions. In this study, we present an integrative approach that combines genome-centric metagenomic analysis of plastic-enriched bacterial communities with semi-supervised machine learning to identify novel plastic-degrading enzymes. This approach was optimized for scenarios involving multiple longitudinal samples collected from distinct microbial communities, each adapted to different plastic substrates and physical forms. As a case study, this research focuses on communities adapted to polyethylene, polyethylene terephthalate, and polyurethane substrates, providing a diverse experimental landscape for evaluating enzyme discovery across polymer types and conformations. Genome-centric metagenomic analysis was performed through a comprehensive pipeline including multiple rounds of genome binning with different tools, followed by bin refinement and dereplication. To ensure optimal genome quality for downstream analyses, two distinct assembly strategies were empirically tested, leading to improved genome recovery rate across all the bacterial communities. This pipeline thus enabled the identification of candidate organisms with potential plastic-degrading capabilities for further investigation using machine learning. The proteome of one such organism, \textit{Rhodococcus aetherivorans}, was analyzed using a label propagation model trained on both sequence and structure-based features, including ESM-2-derived embeddings and AlphaFold2-generated graph representations. The model showed a high estimated performance on the recovery of known plastic-degrading enzymes and was subsequently applied to proteins derived from metagenomic data. By comparing its predictions with those of traditional homology-based methods and their structural similarity with known enzymes, this study identified potential candidate enzymes for experimental validation. These findings highlight the potential of integrating metagenomics with machine learning for functional enzyme discovery and advancing biotechnological solutions to plastic pollution.

The widespread accumulation of micro- and nano-plastics in the environment represents an escalating ecological threat, necessitating innovative strategies for mitigation. Microbial plastic degradation offers a promising solution; however, the discovery of relevant enzymes remains challenging due to limited annotation of emerging or latent degradation capabilities. A promising avenue to address this challenge lies in the growing availability of biological data, driven especially by the rise of environmental metagenomics. This wealth of data offers unprecedented opportunities to explore microbial diversity and discover novel plastic-degrading functions. In this study, we present an integrative approach that combines genome-centric metagenomic analysis of plastic-enriched bacterial communities with semi-supervised machine learning to identify novel plastic-degrading enzymes. This approach was optimized for scenarios involving multiple longitudinal samples collected from distinct microbial communities, each adapted to different plastic substrates and physical forms. As a case study, this research focuses on communities adapted to polyethylene, polyethylene terephthalate, and polyurethane substrates, providing a diverse experimental landscape for evaluating enzyme discovery across polymer types and conformations. Genome-centric metagenomic analysis was performed through a comprehensive pipeline including multiple rounds of genome binning with different tools, followed by bin refinement and dereplication. To ensure optimal genome quality for downstream analyses, two distinct assembly strategies were empirically tested, leading to improved genome recovery rate across all the bacterial communities. This pipeline thus enabled the identification of candidate organisms with potential plastic-degrading capabilities for further investigation using machine learning. The proteome of one such organism, \textit{Rhodococcus aetherivorans}, was analyzed using a label propagation model trained on both sequence and structure-based features, including ESM-2-derived embeddings and AlphaFold2-generated graph representations. The model showed a high estimated performance on the recovery of known plastic-degrading enzymes and was subsequently applied to proteins derived from metagenomic data. By comparing its predictions with those of traditional homology-based methods and their structural similarity with known enzymes, this study identified potential candidate enzymes for experimental validation. These findings highlight the potential of integrating metagenomics with machine learning for functional enzyme discovery and advancing biotechnological solutions to plastic pollution.

A Dual Approach to Plastic Biodegradation: From Metagenomic Assemblies to Predictive Enzyme Discovery

AGOSTINI, FLAVIO
2024/2025

Abstract

The widespread accumulation of micro- and nano-plastics in the environment represents an escalating ecological threat, necessitating innovative strategies for mitigation. Microbial plastic degradation offers a promising solution; however, the discovery of relevant enzymes remains challenging due to limited annotation of emerging or latent degradation capabilities. A promising avenue to address this challenge lies in the growing availability of biological data, driven especially by the rise of environmental metagenomics. This wealth of data offers unprecedented opportunities to explore microbial diversity and discover novel plastic-degrading functions. In this study, we present an integrative approach that combines genome-centric metagenomic analysis of plastic-enriched bacterial communities with semi-supervised machine learning to identify novel plastic-degrading enzymes. This approach was optimized for scenarios involving multiple longitudinal samples collected from distinct microbial communities, each adapted to different plastic substrates and physical forms. As a case study, this research focuses on communities adapted to polyethylene, polyethylene terephthalate, and polyurethane substrates, providing a diverse experimental landscape for evaluating enzyme discovery across polymer types and conformations. Genome-centric metagenomic analysis was performed through a comprehensive pipeline including multiple rounds of genome binning with different tools, followed by bin refinement and dereplication. To ensure optimal genome quality for downstream analyses, two distinct assembly strategies were empirically tested, leading to improved genome recovery rate across all the bacterial communities. This pipeline thus enabled the identification of candidate organisms with potential plastic-degrading capabilities for further investigation using machine learning. The proteome of one such organism, \textit{Rhodococcus aetherivorans}, was analyzed using a label propagation model trained on both sequence and structure-based features, including ESM-2-derived embeddings and AlphaFold2-generated graph representations. The model showed a high estimated performance on the recovery of known plastic-degrading enzymes and was subsequently applied to proteins derived from metagenomic data. By comparing its predictions with those of traditional homology-based methods and their structural similarity with known enzymes, this study identified potential candidate enzymes for experimental validation. These findings highlight the potential of integrating metagenomics with machine learning for functional enzyme discovery and advancing biotechnological solutions to plastic pollution.
2024
A Dual Approach to Plastic Biodegradation: From Metagenomic Assemblies to Predictive Enzyme Discovery
The widespread accumulation of micro- and nano-plastics in the environment represents an escalating ecological threat, necessitating innovative strategies for mitigation. Microbial plastic degradation offers a promising solution; however, the discovery of relevant enzymes remains challenging due to limited annotation of emerging or latent degradation capabilities. A promising avenue to address this challenge lies in the growing availability of biological data, driven especially by the rise of environmental metagenomics. This wealth of data offers unprecedented opportunities to explore microbial diversity and discover novel plastic-degrading functions. In this study, we present an integrative approach that combines genome-centric metagenomic analysis of plastic-enriched bacterial communities with semi-supervised machine learning to identify novel plastic-degrading enzymes. This approach was optimized for scenarios involving multiple longitudinal samples collected from distinct microbial communities, each adapted to different plastic substrates and physical forms. As a case study, this research focuses on communities adapted to polyethylene, polyethylene terephthalate, and polyurethane substrates, providing a diverse experimental landscape for evaluating enzyme discovery across polymer types and conformations. Genome-centric metagenomic analysis was performed through a comprehensive pipeline including multiple rounds of genome binning with different tools, followed by bin refinement and dereplication. To ensure optimal genome quality for downstream analyses, two distinct assembly strategies were empirically tested, leading to improved genome recovery rate across all the bacterial communities. This pipeline thus enabled the identification of candidate organisms with potential plastic-degrading capabilities for further investigation using machine learning. The proteome of one such organism, \textit{Rhodococcus aetherivorans}, was analyzed using a label propagation model trained on both sequence and structure-based features, including ESM-2-derived embeddings and AlphaFold2-generated graph representations. The model showed a high estimated performance on the recovery of known plastic-degrading enzymes and was subsequently applied to proteins derived from metagenomic data. By comparing its predictions with those of traditional homology-based methods and their structural similarity with known enzymes, this study identified potential candidate enzymes for experimental validation. These findings highlight the potential of integrating metagenomics with machine learning for functional enzyme discovery and advancing biotechnological solutions to plastic pollution.
METAGENOMICS
SEMI-SUPERVISED LEAR
PLASTIC DEGRADATION
MICROBIAL ENZYMES
DATA INTEGRATION
File in questo prodotto:
File Dimensione Formato  
flavio_agostini_thesis.pdf

accesso aperto

Dimensione 8.78 MB
Formato Adobe PDF
8.78 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/89821