This thesis aims to develop a solution that processes public information and presents it in a clear, digestible format, improving the clinical trial experience for both participants and researchers. By retrieving and summarizing relevant data, the chatbot will support key areas such as patient recruitment, eligibility, and trial information accessibility. The research begins by examining the state of clinical trials and identifying key areas where AI can provide value. It also addresses current challenges in accessing and understanding trial-related information, ensuring users receive clear and useful insights from publicly available data. The study investigates how Retrieval-Augmented Generation (RAG), combined with Natural Language Processing (NLP), Vector Databases, and prompt engineering, can enhance document retrieval and streamline clinical trial processes. To evaluate the solution, question-answer pairs are created and scored for correctness using automated metrics like BLEU, METEOR, and ROUGE. The solution’s performance is compared to previous models, analyzing errors and proposing improvements. Various techniques, including RAG approaches, prompt engineering, and query enhancement, are tested to identify the most effective methods, supported by numerical evidence.

Improving Clinical Trial Information Retrieval: Exploring the Role of Natural Language Processing and Retrieval-Augmented Generation

MAHFOUZ, AURIANE
2023/2024

Abstract

This thesis aims to develop a solution that processes public information and presents it in a clear, digestible format, improving the clinical trial experience for both participants and researchers. By retrieving and summarizing relevant data, the chatbot will support key areas such as patient recruitment, eligibility, and trial information accessibility. The research begins by examining the state of clinical trials and identifying key areas where AI can provide value. It also addresses current challenges in accessing and understanding trial-related information, ensuring users receive clear and useful insights from publicly available data. The study investigates how Retrieval-Augmented Generation (RAG), combined with Natural Language Processing (NLP), Vector Databases, and prompt engineering, can enhance document retrieval and streamline clinical trial processes. To evaluate the solution, question-answer pairs are created and scored for correctness using automated metrics like BLEU, METEOR, and ROUGE. The solution’s performance is compared to previous models, analyzing errors and proposing improvements. Various techniques, including RAG approaches, prompt engineering, and query enhancement, are tested to identify the most effective methods, supported by numerical evidence.
2023
Improving Clinical Trial Information Retrieval: Exploring the Role of Natural Language Processing and Retrieval-Augmented Generation
RAG
NLP
Language models
Vector databases
Query
File in questo prodotto:
File Dimensione Formato  
Auriane_Mahfouz_MSc_Thesis.pdf

accesso riservato

Dimensione 3.48 MB
Formato Adobe PDF
3.48 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/80894