This thesis aims to develop a solution that processes public information and presents it in a clear, digestible format, improving the clinical trial experience for both participants and researchers. By retrieving and summarizing relevant data, the chatbot will support key areas such as patient recruitment, eligibility, and trial information accessibility. The research begins by examining the state of clinical trials and identifying key areas where AI can provide value. It also addresses current challenges in accessing and understanding trial-related information, ensuring users receive clear and useful insights from publicly available data. The study investigates how Retrieval-Augmented Generation (RAG), combined with Natural Language Processing (NLP), Vector Databases, and prompt engineering, can enhance document retrieval and streamline clinical trial processes. To evaluate the solution, question-answer pairs are created and scored for correctness using automated metrics like BLEU, METEOR, and ROUGE. The solution’s performance is compared to previous models, analyzing errors and proposing improvements. Various techniques, including RAG approaches, prompt engineering, and query enhancement, are tested to identify the most effective methods, supported by numerical evidence.
Improving Clinical Trial Information Retrieval: Exploring the Role of Natural Language Processing and Retrieval-Augmented Generation
MAHFOUZ, AURIANE
2023/2024
Abstract
This thesis aims to develop a solution that processes public information and presents it in a clear, digestible format, improving the clinical trial experience for both participants and researchers. By retrieving and summarizing relevant data, the chatbot will support key areas such as patient recruitment, eligibility, and trial information accessibility. The research begins by examining the state of clinical trials and identifying key areas where AI can provide value. It also addresses current challenges in accessing and understanding trial-related information, ensuring users receive clear and useful insights from publicly available data. The study investigates how Retrieval-Augmented Generation (RAG), combined with Natural Language Processing (NLP), Vector Databases, and prompt engineering, can enhance document retrieval and streamline clinical trial processes. To evaluate the solution, question-answer pairs are created and scored for correctness using automated metrics like BLEU, METEOR, and ROUGE. The solution’s performance is compared to previous models, analyzing errors and proposing improvements. Various techniques, including RAG approaches, prompt engineering, and query enhancement, are tested to identify the most effective methods, supported by numerical evidence.File | Dimensione | Formato | |
---|---|---|---|
Auriane_Mahfouz_MSc_Thesis.pdf
accesso riservato
Dimensione
3.48 MB
Formato
Adobe PDF
|
3.48 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/80894