Understanding genomic information is fundamental to the advancement of biomedical research, precision medicine, and therapeutic development, but the complexity and distributed nature of genomic databases present significant barriers to efficient knowledge retrieval. Although Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, their application to specialized genomic question answering remains limited by challenges in accessing domain-specific databases, maintaining factual accuracy, and integrating information from heterogeneous sources. To address these limitations, We introduce GenomAgent, a multi-agent framework designed for genomic question answering that leverages coordinated autonomous agents to process queries systematically. The system employs a task classification agent to identify query types and extract relevant genomic terms, followed by a distributed information retrieval mechanism that interfaces with multiple data sources through Model Context Protocol (MCP) web browser integration. The recovered information undergoes automated consolidation and cleaning through dedicated processing agents before a final synthesis agent generates comprehensive, contextualized responses. We evaluated GenomAgent against GeneGPT, a pioneering system that uses NCBI API calls to answer genomic questions, using established benchmark datasets. Our results demonstrate that GenomAgent achieves performance comparable to or exceeding GeneGPT on all evaluated tasks while offering superior flexibility through its modular architecture. The adaptability of the methodology to various data sources, including API-based services and HTML text extraction, positions it as a generalizable framework applicable to diverse domains of question answers beyond genomics. This work establishes a foundation for scalable, agent-based approaches to specialized knowledge retrieval in data-intensive scientific fields.
Understanding genomic information is fundamental to the advancement of biomedical research, precision medicine, and therapeutic development, but the complexity and distributed nature of genomic databases present significant barriers to efficient knowledge retrieval. Although Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, their application to specialized genomic question answering remains limited by challenges in accessing domain-specific databases, maintaining factual accuracy, and integrating information from heterogeneous sources. To address these limitations, We introduce GenomAgent, a multi-agent framework designed for genomic question answering that leverages coordinated autonomous agents to process queries systematically. The system employs a task classification agent to identify query types and extract relevant genomic terms, followed by a distributed information retrieval mechanism that interfaces with multiple data sources through Model Context Protocol (MCP) web browser integration. The recovered information undergoes automated consolidation and cleaning through dedicated processing agents before a final synthesis agent generates comprehensive, contextualized responses. We evaluated GenomAgent against GeneGPT, a pioneering system that uses NCBI API calls to answer genomic questions, using established benchmark datasets. Our results demonstrate that GenomAgent achieves performance comparable to or exceeding GeneGPT on all evaluated tasks while offering superior flexibility through its modular architecture. The adaptability of the methodology to various data sources, including API-based services and HTML text extraction, positions it as a generalizable framework applicable to diverse domains of question answers beyond genomics. This work establishes a foundation for scalable, agent-based approaches to specialized knowledge retrieval in data-intensive scientific fields.
Biomedical Question Answering: Extending GeneGPT with the code act paradigm
ABEDINI, KIMIA
2024/2025
Abstract
Understanding genomic information is fundamental to the advancement of biomedical research, precision medicine, and therapeutic development, but the complexity and distributed nature of genomic databases present significant barriers to efficient knowledge retrieval. Although Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, their application to specialized genomic question answering remains limited by challenges in accessing domain-specific databases, maintaining factual accuracy, and integrating information from heterogeneous sources. To address these limitations, We introduce GenomAgent, a multi-agent framework designed for genomic question answering that leverages coordinated autonomous agents to process queries systematically. The system employs a task classification agent to identify query types and extract relevant genomic terms, followed by a distributed information retrieval mechanism that interfaces with multiple data sources through Model Context Protocol (MCP) web browser integration. The recovered information undergoes automated consolidation and cleaning through dedicated processing agents before a final synthesis agent generates comprehensive, contextualized responses. We evaluated GenomAgent against GeneGPT, a pioneering system that uses NCBI API calls to answer genomic questions, using established benchmark datasets. Our results demonstrate that GenomAgent achieves performance comparable to or exceeding GeneGPT on all evaluated tasks while offering superior flexibility through its modular architecture. The adaptability of the methodology to various data sources, including API-based services and HTML text extraction, positions it as a generalizable framework applicable to diverse domains of question answers beyond genomics. This work establishes a foundation for scalable, agent-based approaches to specialized knowledge retrieval in data-intensive scientific fields.| File | Dimensione | Formato | |
|---|---|---|---|
|
Computer_Engineering_MsC_Thesis___UniPD-26.pdf
accesso aperto
Dimensione
4.16 MB
Formato
Adobe PDF
|
4.16 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/98449