Hepatitis E virus (HEV), particularly genotypes 3 and 4, represents a significant zoonotic threat with complex transmission dynamics involving humans, domestic animals, and wildlife. This thesis explores the application of neural networks to predict the host origin of HEV based on viral genomic sequences, utilizing a bioinformatic approach centered on k-mer encoding of nucleotide data. A curated dataset of 448 complete HEV genomes with verified host metadata was assembled, focusing on zoonotic genotypes 3 and 4 to uncover subtle genomic signatures indicative of host tropism. A fully connected feed-forward neural network was developed and trained on normalized 6-mer frequency vectors, with performance evaluated through stratified 5-fold cross-validation and independent testing. The model achieved moderate predictive accuracy (~20–70%), excelling in classifying well-represented hosts such as humans and rats, while struggling with underrepresented wildlife hosts due to data imbalance and biological similarity. Phylogenetic analysis corroborated the zoonotic potential of HEV, revealing frequent cross-species transmissions and highlighting critical transmission pathways including foodborne, occupational, and environmental routes. This study demonstrates the feasibility of integrating machine learning with viral genomics to enhance zoonotic risk assessment. Despite current limitations related to data scarcity and class imbalance, the findings emphasize the promise of computational methods for improving surveillance and control strategies for HEV outbreaks. Future work focusing on data enrichment, hierarchical classification, and incorporation of richer genomic features could further advance predictive accuracy and public health impact.

Hepatitis E virus (HEV), particularly genotypes 3 and 4, represents a significant zoonotic threat with complex transmission dynamics involving humans, domestic animals, and wildlife. This thesis explores the application of neural networks to predict the host origin of HEV based on viral genomic sequences, utilizing a bioinformatic approach centered on k-mer encoding of nucleotide data. A curated dataset of 448 complete HEV genomes with verified host metadata was assembled, focusing on zoonotic genotypes 3 and 4 to uncover subtle genomic signatures indicative of host tropism. A fully connected feed-forward neural network was developed and trained on normalized 6-mer frequency vectors, with performance evaluated through stratified 5-fold cross-validation and independent testing. The model achieved moderate predictive accuracy (~20–70%), excelling in classifying well-represented hosts such as humans and rats, while struggling with underrepresented wildlife hosts due to data imbalance and biological similarity. Phylogenetic analysis corroborated the zoonotic potential of HEV, revealing frequent cross-species transmissions and highlighting critical transmission pathways including foodborne, occupational, and environmental routes. This study demonstrates the feasibility of integrating machine learning with viral genomics to enhance zoonotic risk assessment. Despite current limitations related to data scarcity and class imbalance, the findings emphasize the promise of computational methods for improving surveillance and control strategies for HEV outbreaks. Future work focusing on data enrichment, hierarchical classification, and incorporation of richer genomic features could further advance predictive accuracy and public health impact.

Phylogenetic and Neural Network-Based Zoonotic Risk Assessment of Hepatitis E Virus

BELOUS, ANNA
2024/2025

Abstract

Hepatitis E virus (HEV), particularly genotypes 3 and 4, represents a significant zoonotic threat with complex transmission dynamics involving humans, domestic animals, and wildlife. This thesis explores the application of neural networks to predict the host origin of HEV based on viral genomic sequences, utilizing a bioinformatic approach centered on k-mer encoding of nucleotide data. A curated dataset of 448 complete HEV genomes with verified host metadata was assembled, focusing on zoonotic genotypes 3 and 4 to uncover subtle genomic signatures indicative of host tropism. A fully connected feed-forward neural network was developed and trained on normalized 6-mer frequency vectors, with performance evaluated through stratified 5-fold cross-validation and independent testing. The model achieved moderate predictive accuracy (~20–70%), excelling in classifying well-represented hosts such as humans and rats, while struggling with underrepresented wildlife hosts due to data imbalance and biological similarity. Phylogenetic analysis corroborated the zoonotic potential of HEV, revealing frequent cross-species transmissions and highlighting critical transmission pathways including foodborne, occupational, and environmental routes. This study demonstrates the feasibility of integrating machine learning with viral genomics to enhance zoonotic risk assessment. Despite current limitations related to data scarcity and class imbalance, the findings emphasize the promise of computational methods for improving surveillance and control strategies for HEV outbreaks. Future work focusing on data enrichment, hierarchical classification, and incorporation of richer genomic features could further advance predictive accuracy and public health impact.
2024
Phylogenetic and Neural Network-Based Zoonotic Risk Assessment of Hepatitis E Virus
Hepatitis E virus (HEV), particularly genotypes 3 and 4, represents a significant zoonotic threat with complex transmission dynamics involving humans, domestic animals, and wildlife. This thesis explores the application of neural networks to predict the host origin of HEV based on viral genomic sequences, utilizing a bioinformatic approach centered on k-mer encoding of nucleotide data. A curated dataset of 448 complete HEV genomes with verified host metadata was assembled, focusing on zoonotic genotypes 3 and 4 to uncover subtle genomic signatures indicative of host tropism. A fully connected feed-forward neural network was developed and trained on normalized 6-mer frequency vectors, with performance evaluated through stratified 5-fold cross-validation and independent testing. The model achieved moderate predictive accuracy (~20–70%), excelling in classifying well-represented hosts such as humans and rats, while struggling with underrepresented wildlife hosts due to data imbalance and biological similarity. Phylogenetic analysis corroborated the zoonotic potential of HEV, revealing frequent cross-species transmissions and highlighting critical transmission pathways including foodborne, occupational, and environmental routes. This study demonstrates the feasibility of integrating machine learning with viral genomics to enhance zoonotic risk assessment. Despite current limitations related to data scarcity and class imbalance, the findings emphasize the promise of computational methods for improving surveillance and control strategies for HEV outbreaks. Future work focusing on data enrichment, hierarchical classification, and incorporation of richer genomic features could further advance predictive accuracy and public health impact.
HEV genotype 3
HEV genotype 4
Zoonosis
Host
Neural network
File in questo prodotto:
File Dimensione Formato  
Belous_Anna.pdf

accesso aperto

Dimensione 567.3 kB
Formato Adobe PDF
567.3 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/91261