Reproducibility and Generalization of a Relation Extraction System for Gene-Disease Associations

Biomedical literature is a rich source of information on Gene-Disease Associations (GDAs) that could help physicians in assessing clinical decisions and improve patient care. GDAs are publicly available in databases containing relationships between gene/miRNA expression and related diseases such as specific types of cancer. Most of these resources, such as DisGeNET, miR2Disease and BioXpress, include also manually curated data from publications. Human annotations are expensive and cannot scale to the huge amount of data available in scientific literature (e.g., biomedical abstracts). Therefore, developing automated tools to identify GDAs is getting traction in the community. Such systems employ Relation Extraction (RE) techniques to extract information on gene/microRNA expression in diseases from text. Once an automated text-mining tool has been developed, it can be tested on human annotated data or it can be compared to state-of-the-art systems. In this work we reproduce DEXTER, a system to automatically extract Gene- Disease Associations (GDAs) from biomedical abstracts. The goal is to provide a benchmark for future works regarding Relation Extraction (RE), enabling researchers to test and compare their results. The implemented version of DEXTER is available in the following git repository: https://github.com/mntlra/DEXTER.

Reproducibility and Generalization of a Relation Extraction System for Gene-Disease Associations

MENOTTI, LAURA

2021/2022

Abstract

Biomedical literature is a rich source of information on Gene-Disease Associations (GDAs) that could help physicians in assessing clinical decisions and improve patient care. GDAs are publicly available in databases containing relationships between gene/miRNA expression and related diseases such as specific types of cancer. Most of these resources, such as DisGeNET, miR2Disease and BioXpress, include also manually curated data from publications. Human annotations are expensive and cannot scale to the huge amount of data available in scientific literature (e.g., biomedical abstracts). Therefore, developing automated tools to identify GDAs is getting traction in the community. Such systems employ Relation Extraction (RE) techniques to extract information on gene/microRNA expression in diseases from text. Once an automated text-mining tool has been developed, it can be tested on human annotated data or it can be compared to state-of-the-art systems. In this work we reproduce DEXTER, a system to automatically extract Gene- Disease Associations (GDAs) from biomedical abstracts. The goal is to provide a benchmark for future works regarding Relation Extraction (RE), enabling researchers to test and compare their results. The implemented version of DEXTER is available in the following git repository: https://github.com/mntlra/DEXTER.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2021
			
	Titolo inglese
	
				Reproducibility and Generalization of a Relation Extraction System for Gene-Disease Associations
			
	Abstract in italiano
	
				Biomedical literature is a rich source of information on Gene-Disease Associations
(GDAs) that could help physicians in assessing clinical decisions and improve patient
care. GDAs are publicly available in databases containing relationships between
gene/miRNA expression and related diseases such as specific types of cancer.
Most of these resources, such as DisGeNET, miR2Disease and BioXpress, include
also manually curated data from publications. Human annotations are expensive
and cannot scale to the huge amount of data available in scientific literature (e.g.,
biomedical abstracts). Therefore, developing automated tools to identify GDAs is
getting traction in the community. Such systems employ Relation Extraction (RE)
techniques to extract information on gene/microRNA expression in diseases from
text. Once an automated text-mining tool has been developed, it can be tested on
human annotated data or it can be compared to state-of-the-art systems.
In this work we reproduce DEXTER, a system to automatically extract Gene-
Disease Associations (GDAs) from biomedical abstracts. The goal is to provide a
benchmark for future works regarding Relation Extraction (RE), enabling researchers
to test and compare their results.
The implemented version of DEXTER is available in the following git repository:
https://github.com/mntlra/DEXTER.
			
	Relatore
	
				SILVELLO, GIANMARIA
			
	Correlatore
	
				MARCHESIN, STEFANO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Menotti_Laura.pdf accesso aperto Dimensione 3.42 MB Formato Adobe PDF Visualizza/Apri	3.42 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/35579