Multi-Agent Human-AI Framework for Transcriptions

The work introduces a multi-agent system designed to produce accurate video transcriptions by combining several LLMs with human feedback. The pipeline begins by generating multiple transcriptions using different speech recognition models, including Whisper, Vosk, and Facebook-MMS. These transcriptions are then compared by a separate LLM that highlights their differences. The system selects the most accurate n-version through a majority voting mechanism. The chosen transcriptions are presented to users through an interactive application, where they can be reviewed and manually refined. Once finalized, a RAG model analyzes the script and provides content-based recommendations to the user.

Multi-Agent Human-AI Framework for Transcriptions

AKBULUT, ULASCAN

2025/2026

Abstract

The work introduces a multi-agent system designed to produce accurate video transcriptions by combining several LLMs with human feedback. The pipeline begins by generating multiple transcriptions using different speech recognition models, including Whisper, Vosk, and Facebook-MMS. These transcriptions are then compared by a separate LLM that highlights their differences. The system selects the most accurate n-version through a majority voting mechanism. The chosen transcriptions are presented to users through an interactive application, where they can be reviewed and manually refined. Once finalized, a RAG model analyzes the script and provides content-based recommendations to the user.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Matematica "Tullio Levi-Civita" - DM
			
	Corso di studio
	
				DATA SCIENCE  Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2025
			
	Titolo inglese
	
				Multi-Agent Human-AI Framework for Transcriptions
			
	Parola chiave
	
				LLMs
Transcriptions
Whisper
			
	Relatore
	
				ERSEGHE, TOMASO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
AkbulutUlascan2106046_MasterThesis.pdf accesso aperto Dimensione 2.98 MB Formato Adobe PDF Visualizza/Apri	2.98 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/108222