In recent years, huge amounts of biomedical data have been produced. The rich information content of such data could be exploited for several purposes including diagnostics and supporting the medical decision-making process. Nevertheless, most of this information is stored to date using unstructured formats, as occurs, for instance, for free-text narrative clinical reports and clinical notes saved in Electronic Health Records (EHRs). Hence, these documents are human-readable but not machine-readable. Despite some Laboratory Information Systems (LISs) support structured data and synoptic reports, the adoption of structured and machine-readable formats is still limited. This poses hindrances to the full exploitation of computational approaches for data analysis, pattern recognition, and any other secondary use in general. To mitigate this, knowledge extraction methods could be used to automatically extract meaningful information from biomedical textual data provided in natural language. In this thesis, we tackle the former issues by investigating the application of different knowledge extraction techniques for free-text clinical reports coming from the digital pathology domain. Firstly, we manually defined curated ground truths containing all the relevant information extracted from a set of clinical reports. Secondly, we implemented several state-of-the-art techniques for knowledge extraction. Then, we evaluated the performance of such knowledge extraction algorithms against the ground truths. From the analyses conducted, it emerges that the effectiveness of knowledge extraction algorithms depends on the variability of the pathology reports examined and on the kind of entities to extract. Hence, most of the algorithmic approaches considered in our analyses obtain different results that varies significantly in terms of precision and recall.

In recent years, huge amounts of biomedical data have been produced. The rich information content of such data could be exploited for several purposes including diagnostics and supporting the medical decision-making process. Nevertheless, most of this information is stored to date using unstructured formats, as occurs, for instance, for free-text narrative clinical reports and clinical notes saved in Electronic Health Records (EHRs). Hence, these documents are human-readable but not machine-readable. Despite some Laboratory Information Systems (LISs) support structured data and synoptic reports, the adoption of structured and machine-readable formats is still limited. This poses hindrances to the full exploitation of computational approaches for data analysis, pattern recognition, and any other secondary use in general. To mitigate this, knowledge extraction methods could be used to automatically extract meaningful information from biomedical textual data provided in natural language. In this thesis, we tackle the former issues by investigating the application of different knowledge extraction techniques for free-text clinical reports coming from the digital pathology domain. Firstly, we manually defined curated ground truths containing all the relevant information extracted from a set of clinical reports. Secondly, we implemented several state-of-the-art techniques for knowledge extraction. Then, we evaluated the performance of such knowledge extraction algorithms against the ground truths. From the analyses conducted, it emerges that the effectiveness of knowledge extraction algorithms depends on the variability of the pathology reports examined and on the kind of entities to extract. Hence, most of the algorithmic approaches considered in our analyses obtain different results that varies significantly in terms of precision and recall.

Entity Extraction and Linking for Digital Pathology

GALIAZZO, RICCARDO
2021/2022

Abstract

In recent years, huge amounts of biomedical data have been produced. The rich information content of such data could be exploited for several purposes including diagnostics and supporting the medical decision-making process. Nevertheless, most of this information is stored to date using unstructured formats, as occurs, for instance, for free-text narrative clinical reports and clinical notes saved in Electronic Health Records (EHRs). Hence, these documents are human-readable but not machine-readable. Despite some Laboratory Information Systems (LISs) support structured data and synoptic reports, the adoption of structured and machine-readable formats is still limited. This poses hindrances to the full exploitation of computational approaches for data analysis, pattern recognition, and any other secondary use in general. To mitigate this, knowledge extraction methods could be used to automatically extract meaningful information from biomedical textual data provided in natural language. In this thesis, we tackle the former issues by investigating the application of different knowledge extraction techniques for free-text clinical reports coming from the digital pathology domain. Firstly, we manually defined curated ground truths containing all the relevant information extracted from a set of clinical reports. Secondly, we implemented several state-of-the-art techniques for knowledge extraction. Then, we evaluated the performance of such knowledge extraction algorithms against the ground truths. From the analyses conducted, it emerges that the effectiveness of knowledge extraction algorithms depends on the variability of the pathology reports examined and on the kind of entities to extract. Hence, most of the algorithmic approaches considered in our analyses obtain different results that varies significantly in terms of precision and recall.
2021
Entity Extraction and Linking for Digital Pathology
In recent years, huge amounts of biomedical data have been produced. The rich information content of such data could be exploited for several purposes including diagnostics and supporting the medical decision-making process. Nevertheless, most of this information is stored to date using unstructured formats, as occurs, for instance, for free-text narrative clinical reports and clinical notes saved in Electronic Health Records (EHRs). Hence, these documents are human-readable but not machine-readable. Despite some Laboratory Information Systems (LISs) support structured data and synoptic reports, the adoption of structured and machine-readable formats is still limited. This poses hindrances to the full exploitation of computational approaches for data analysis, pattern recognition, and any other secondary use in general. To mitigate this, knowledge extraction methods could be used to automatically extract meaningful information from biomedical textual data provided in natural language. In this thesis, we tackle the former issues by investigating the application of different knowledge extraction techniques for free-text clinical reports coming from the digital pathology domain. Firstly, we manually defined curated ground truths containing all the relevant information extracted from a set of clinical reports. Secondly, we implemented several state-of-the-art techniques for knowledge extraction. Then, we evaluated the performance of such knowledge extraction algorithms against the ground truths. From the analyses conducted, it emerges that the effectiveness of knowledge extraction algorithms depends on the variability of the pathology reports examined and on the kind of entities to extract. Hence, most of the algorithmic approaches considered in our analyses obtain different results that varies significantly in terms of precision and recall.
Entity
Extraction
Linking
Digital
Pathology
File in questo prodotto:
File Dimensione Formato  
Galiazzo_Riccardo.pdf

accesso aperto

Dimensione 12.03 MB
Formato Adobe PDF
12.03 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/40261