Background Meta-analyses quantitatively synthesise evidence across studies through standardised statistical procedures, improving precision and resolving inconsistencies. However, their execution requires multidisciplinary expertise, extensive time resources, and complex data extraction processes, which limit scalability. Recent advances in Large Language Models (LLMs) and structured frameworks such as the Model Context Protocol (MCP) offer promising opportunities to automate mechanical components of meta-analytic workflows while preserving essential human oversight. Preliminary evidence suggests that LLMs can support study screening, structured data extraction, and generation of reproducible statistical code, though challenges remain regarding accuracy, transparency, and integration into clinical research practice. Methods We conducted a systematic validation of an AI-assisted workflow combining Meta-Analysis MCP and PDF Data Extractor. Twenty published meta-analyses across diverse clinical areas were selected to ensure heterogeneity in outcome types, study designs, and methodological complexity. Three parallel approaches were compared: LLM-only, Human-only, and Hybrid (AI extraction with human verification). Validation metrics included extraction accuracy (exact match, tolerance thresholds, completeness), concordance of meta-analytic results (effect size estimates, confidence intervals, heterogeneity), and time efficiency. Statistical comparisons were stratified by outcome type, number of included studies, and analytical complexity. Results Case studies reflected broad clinical and methodological diversity, including continuous, binary, time-to-event, and proportion outcomes, with frequent subgroup and sensitivity analyses. Across the 20 meta-analyses, accuracy, concordance, and efficiency outcomes were pooled. Detailed results for one representative case study will illustrate performance differences, while summary statistics will be reported for the remaining nineteen. Discussion This study provides empirical evidence on the strengths and limitations of LLM-assisted meta-analysis. Findings inform appropriate contexts for AI integration, clarify the ongoing role of human expertise, and outline requirements for responsible, transparent, and reproducible adoption of AI-driven tools in epidemiological research.

Background Meta-analyses quantitatively synthesise evidence across studies through standardised statistical procedures, improving precision and resolving inconsistencies. However, their execution requires multidisciplinary expertise, extensive time resources, and complex data extraction processes, which limit scalability. Recent advances in Large Language Models (LLMs) and structured frameworks such as the Model Context Protocol (MCP) offer promising opportunities to automate mechanical components of meta-analytic workflows while preserving essential human oversight. Preliminary evidence suggests that LLMs can support study screening, structured data extraction, and generation of reproducible statistical code, though challenges remain regarding accuracy, transparency, and integration into clinical research practice. Methods We conducted a systematic validation of an AI-assisted workflow combining Meta-Analysis MCP and PDF Data Extractor. Twenty published meta-analyses across diverse clinical areas were selected to ensure heterogeneity in outcome types, study designs, and methodological complexity. Three parallel approaches were compared: LLM-only, Human-only, and Hybrid (AI extraction with human verification). Validation metrics included extraction accuracy (exact match, tolerance thresholds, completeness), concordance of meta-analytic results (effect size estimates, confidence intervals, heterogeneity), and time efficiency. Statistical comparisons were stratified by outcome type, number of included studies, and analytical complexity. Results Case studies reflected broad clinical and methodological diversity, including continuous, binary, time-to-event, and proportion outcomes, with frequent subgroup and sensitivity analyses. Across the 20 meta-analyses, accuracy, concordance, and efficiency outcomes were pooled. Detailed results for one representative case study will illustrate performance differences, while summary statistics will be reported for the remaining nineteen. Discussion This study provides empirical evidence on the strengths and limitations of LLM-assisted meta-analysis. Findings inform appropriate contexts for AI integration, clarify the ongoing role of human expertise, and outline requirements for responsible, transparent, and reproducible adoption of AI-driven tools in epidemiological research.

Validation of LLM-Automated Meta-Analysis: A Comparative Study with Human-Led Approaches

BERTI, GIACOMO
2023/2024

Abstract

Background Meta-analyses quantitatively synthesise evidence across studies through standardised statistical procedures, improving precision and resolving inconsistencies. However, their execution requires multidisciplinary expertise, extensive time resources, and complex data extraction processes, which limit scalability. Recent advances in Large Language Models (LLMs) and structured frameworks such as the Model Context Protocol (MCP) offer promising opportunities to automate mechanical components of meta-analytic workflows while preserving essential human oversight. Preliminary evidence suggests that LLMs can support study screening, structured data extraction, and generation of reproducible statistical code, though challenges remain regarding accuracy, transparency, and integration into clinical research practice. Methods We conducted a systematic validation of an AI-assisted workflow combining Meta-Analysis MCP and PDF Data Extractor. Twenty published meta-analyses across diverse clinical areas were selected to ensure heterogeneity in outcome types, study designs, and methodological complexity. Three parallel approaches were compared: LLM-only, Human-only, and Hybrid (AI extraction with human verification). Validation metrics included extraction accuracy (exact match, tolerance thresholds, completeness), concordance of meta-analytic results (effect size estimates, confidence intervals, heterogeneity), and time efficiency. Statistical comparisons were stratified by outcome type, number of included studies, and analytical complexity. Results Case studies reflected broad clinical and methodological diversity, including continuous, binary, time-to-event, and proportion outcomes, with frequent subgroup and sensitivity analyses. Across the 20 meta-analyses, accuracy, concordance, and efficiency outcomes were pooled. Detailed results for one representative case study will illustrate performance differences, while summary statistics will be reported for the remaining nineteen. Discussion This study provides empirical evidence on the strengths and limitations of LLM-assisted meta-analysis. Findings inform appropriate contexts for AI integration, clarify the ongoing role of human expertise, and outline requirements for responsible, transparent, and reproducible adoption of AI-driven tools in epidemiological research.
2023
Validation of LLM-Automated Meta-Analysis: A Comparative Study with Human-Led Approaches
Background Meta-analyses quantitatively synthesise evidence across studies through standardised statistical procedures, improving precision and resolving inconsistencies. However, their execution requires multidisciplinary expertise, extensive time resources, and complex data extraction processes, which limit scalability. Recent advances in Large Language Models (LLMs) and structured frameworks such as the Model Context Protocol (MCP) offer promising opportunities to automate mechanical components of meta-analytic workflows while preserving essential human oversight. Preliminary evidence suggests that LLMs can support study screening, structured data extraction, and generation of reproducible statistical code, though challenges remain regarding accuracy, transparency, and integration into clinical research practice. Methods We conducted a systematic validation of an AI-assisted workflow combining Meta-Analysis MCP and PDF Data Extractor. Twenty published meta-analyses across diverse clinical areas were selected to ensure heterogeneity in outcome types, study designs, and methodological complexity. Three parallel approaches were compared: LLM-only, Human-only, and Hybrid (AI extraction with human verification). Validation metrics included extraction accuracy (exact match, tolerance thresholds, completeness), concordance of meta-analytic results (effect size estimates, confidence intervals, heterogeneity), and time efficiency. Statistical comparisons were stratified by outcome type, number of included studies, and analytical complexity. Results Case studies reflected broad clinical and methodological diversity, including continuous, binary, time-to-event, and proportion outcomes, with frequent subgroup and sensitivity analyses. Across the 20 meta-analyses, accuracy, concordance, and efficiency outcomes were pooled. Detailed results for one representative case study will illustrate performance differences, while summary statistics will be reported for the remaining nineteen. Discussion This study provides empirical evidence on the strengths and limitations of LLM-assisted meta-analysis. Findings inform appropriate contexts for AI integration, clarify the ongoing role of human expertise, and outline requirements for responsible, transparent, and reproducible adoption of AI-driven tools in epidemiological research.
LLM
Meta-Analysis
Validation
Human
File in questo prodotto:
File Dimensione Formato  
Berti_Giacomo.pdf

Accesso riservato

Dimensione 341.06 kB
Formato Adobe PDF
341.06 kB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/103250