This thesis presents an automatic speech recognition (ASR) system fine-tuned for the accurate extraction of Italian tax codes (codici fiscali) from spoken input. The work focuses on adapting general-purpose ASR models to domain-specific tasks, where precision in recognizing structured alphanumeric sequences is essential. The system is built on Whisper, a multilingual model developed by OpenAI, and has been adapted to improve accuracy in detecting tax codes pronounced in natural speech. A dedicated validation and error-checking mechanism ensures that only syntactically and logically valid codes are accepted, reducing the impact of minor transcription errors and improving robustness in real-world scenarios. This project demonstrates the effectiveness of fine-tuned ASR systems in specialized contexts and shows promise for applications in administrative, legal, or customer-service domains where accurate extraction of formal identifiers from speech is required.
This thesis presents an automatic speech recognition (ASR) system fine-tuned for the accurate extraction of Italian tax codes (codici fiscali) from spoken input. The work focuses on adapting general-purpose ASR models to domain-specific tasks, where precision in recognizing structured alphanumeric sequences is essential. The system is built on Whisper, a multilingual model developed by OpenAI, and has been adapted to improve accuracy in detecting tax codes pronounced in natural speech. A dedicated validation and error-checking mechanism ensures that only syntactically and logically valid codes are accepted, reducing the impact of minor transcription errors and improving robustness in real-world scenarios. This project demonstrates the effectiveness of fine-tuned ASR systems in specialized contexts and shows promise for applications in administrative, legal, or customer-service domains where accurate extraction of formal identifiers from speech is required.
Fine-tuning of pre-trained ASR models for transcription of Italian fiscal codes
BORASO, FRANCESCO
2024/2025
Abstract
This thesis presents an automatic speech recognition (ASR) system fine-tuned for the accurate extraction of Italian tax codes (codici fiscali) from spoken input. The work focuses on adapting general-purpose ASR models to domain-specific tasks, where precision in recognizing structured alphanumeric sequences is essential. The system is built on Whisper, a multilingual model developed by OpenAI, and has been adapted to improve accuracy in detecting tax codes pronounced in natural speech. A dedicated validation and error-checking mechanism ensures that only syntactically and logically valid codes are accepted, reducing the impact of minor transcription errors and improving robustness in real-world scenarios. This project demonstrates the effectiveness of fine-tuned ASR systems in specialized contexts and shows promise for applications in administrative, legal, or customer-service domains where accurate extraction of formal identifiers from speech is required.| File | Dimensione | Formato | |
|---|---|---|---|
|
Master_Thesis.pdf
Accesso riservato
Dimensione
878.49 kB
Formato
Adobe PDF
|
878.49 kB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/91850