Biomedical Question Answering: Extending GeneGPT with the code act paradigm

Understanding genomic information is fundamental to the advancement of biomedical research, precision medicine, and therapeutic development, but the complexity and distributed nature of genomic databases present significant barriers to efficient knowledge retrieval. Although Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, their application to specialized genomic question answering remains limited by challenges in accessing domain-specific databases, maintaining factual accuracy, and integrating information from heterogeneous sources. To address these limitations, We introduce GenomAgent, a multi-agent framework designed for genomic question answering that leverages coordinated autonomous agents to process queries systematically. The system employs a task classification agent to identify query types and extract relevant genomic terms, followed by a distributed information retrieval mechanism that interfaces with multiple data sources through Model Context Protocol (MCP) web browser integration. The recovered information undergoes automated consolidation and cleaning through dedicated processing agents before a final synthesis agent generates comprehensive, contextualized responses. We evaluated GenomAgent against GeneGPT, a pioneering system that uses NCBI API calls to answer genomic questions, using established benchmark datasets. Our results demonstrate that GenomAgent achieves performance comparable to or exceeding GeneGPT on all evaluated tasks while offering superior flexibility through its modular architecture. The adaptability of the methodology to various data sources, including API-based services and HTML text extraction, positions it as a generalizable framework applicable to diverse domains of question answers beyond genomics. This work establishes a foundation for scalable, agent-based approaches to specialized knowledge retrieval in data-intensive scientific fields.

Biomedical Question Answering: Extending GeneGPT with the code act paradigm

ABEDINI, KIMIA

2024/2025

Abstract

Understanding genomic information is fundamental to the advancement of biomedical research, precision medicine, and therapeutic development, but the complexity and distributed nature of genomic databases present significant barriers to efficient knowledge retrieval. Although Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, their application to specialized genomic question answering remains limited by challenges in accessing domain-specific databases, maintaining factual accuracy, and integrating information from heterogeneous sources. To address these limitations, We introduce GenomAgent, a multi-agent framework designed for genomic question answering that leverages coordinated autonomous agents to process queries systematically. The system employs a task classification agent to identify query types and extract relevant genomic terms, followed by a distributed information retrieval mechanism that interfaces with multiple data sources through Model Context Protocol (MCP) web browser integration. The recovered information undergoes automated consolidation and cleaning through dedicated processing agents before a final synthesis agent generates comprehensive, contextualized responses. We evaluated GenomAgent against GeneGPT, a pioneering system that uses NCBI API calls to answer genomic questions, using established benchmark datasets. Our results demonstrate that GenomAgent achieves performance comparable to or exceeding GeneGPT on all evaluated tasks while offering superior flexibility through its modular architecture. The adaptability of the methodology to various data sources, including API-based services and HTML text extraction, positions it as a generalizable framework applicable to diverse domains of question answers beyond genomics. This work establishes a foundation for scalable, agent-based approaches to specialized knowledge retrieval in data-intensive scientific fields.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				COMPUTER ENGINEERING Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Biomedical Question Answering: Extending GeneGPT with the code act paradigm
			
	Abstract in italiano
	
				Understanding genomic information is fundamental to the advancement of biomedical research, precision medicine, and therapeutic development, but the complexity and distributed nature of genomic databases present significant barriers to efficient knowledge retrieval. Although Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, their application to specialized genomic question answering remains limited by challenges in accessing domain-specific databases, maintaining factual accuracy, and integrating information from heterogeneous sources. To address these limitations, We introduce GenomAgent, a multi-agent framework designed for genomic question answering that leverages coordinated autonomous agents to process queries systematically. The system employs a task classification agent to identify query types and extract relevant genomic terms, followed by a distributed information retrieval mechanism that interfaces with multiple data sources through Model Context Protocol (MCP) web browser integration. The recovered information undergoes automated consolidation and cleaning through dedicated processing agents before a final synthesis agent generates comprehensive, contextualized responses. We evaluated GenomAgent against GeneGPT, a pioneering system that uses NCBI API calls to answer genomic questions, using established benchmark datasets. Our results demonstrate that GenomAgent achieves performance comparable to or exceeding GeneGPT on all evaluated tasks while offering superior flexibility through its modular architecture. The adaptability of the methodology to various data sources, including API-based services and HTML text extraction, positions it as a generalizable framework applicable to diverse domains of question answers beyond genomics. This work establishes a foundation for scalable, agent-based approaches to specialized knowledge retrieval in data-intensive scientific fields.
			
	Parola chiave
	
				Large language model
GeneGpt
Biomedical IR
Biomedical QA
LLM agents
			
	Relatore
	
				SILVELLO, GIANMARIA
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
Computer_Engineering_MsC_Thesis___UniPD-26.pdf accesso aperto Dimensione 4.16 MB Formato Adobe PDF Visualizza/Apri	4.16 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/98449