Adaptive Document Extraction: A Mixture of Experts Approach for Visma’s Smartscan

The extraction of structured data from unstructured documents, such as invoices and receipts, represents a significant challenge in the automation of financial processes. Visma's Smartscan engine employs machine learning to deliver high-accuracy results, yet it encounters performance bottlenecks when addressing domain-specific variations across diverse customer datasets. This thesis investigates the potential of Mixture of Experts (MoE) models to enhance Smartscan's accuracy and efficiency by specializing sub-models for distinct document facets, including type, layout, and language; as well as trying to predict multiple fields from a single model. In the first approach, the data is clustered into coherent subsets based on document characteristics, and dedicated expert subnetworks are trained on each cluster. The MoE architecture employs a dynamic routing mechanism that directs inference tasks to the most suitable expert(s), thereby facilitating the tailored handling of diverse input distributions. In the second technique, a single model has been trained with MoE to explore how features of the documents can be shared to predict multiple fields at once. By tackling domain-specific biases, improving scalability, and enabling dynamic adaptation to diverse input distributions, the proposed MoE approach aspires to demonstrate the viability of machine learning for optimizing document extraction workflows in production environments.

Adaptive Document Extraction: A Mixture of Experts Approach for Visma’s Smartscan

D'ESTE, LUCA

2024/2025

Abstract

The extraction of structured data from unstructured documents, such as invoices and receipts, represents a significant challenge in the automation of financial processes. Visma's Smartscan engine employs machine learning to deliver high-accuracy results, yet it encounters performance bottlenecks when addressing domain-specific variations across diverse customer datasets. This thesis investigates the potential of Mixture of Experts (MoE) models to enhance Smartscan's accuracy and efficiency by specializing sub-models for distinct document facets, including type, layout, and language; as well as trying to predict multiple fields from a single model. In the first approach, the data is clustered into coherent subsets based on document characteristics, and dedicated expert subnetworks are trained on each cluster. The MoE architecture employs a dynamic routing mechanism that directs inference tasks to the most suitable expert(s), thereby facilitating the tailored handling of diverse input distributions. In the second technique, a single model has been trained with MoE to explore how features of the documents can be shared to predict multiple fields at once. By tackling domain-specific biases, improving scalability, and enabling dynamic adaptation to diverse input distributions, the proposed MoE approach aspires to demonstrate the viability of machine learning for optimizing document extraction workflows in production environments.

Scheda

Scheda DC

	Facoltà/Dipartimento
	
				Dipartimento di Ingegneria dell'Informazione - DEI
			
	Corso di studio
	
				ICT FOR INTERNET AND MULTIMEDIA - INGEGNERIA PER LE COMUNICAZIONI MULTIMEDIALI E INTERNET Laurea Magistrale (D.M. 270/2004)
			
	Anno Accademico
	
				2024
			
	Titolo inglese
	
				Adaptive Document Extraction: A Mixture of Experts Approach for Visma’s Smartscan
			
	Abstract in italiano
	
				The extraction of structured data from unstructured documents, such as invoices and receipts, represents a significant challenge in the automation of financial processes. Visma's Smartscan engine employs machine learning to deliver high-accuracy results, yet it encounters performance bottlenecks when addressing domain-specific variations across diverse customer datasets. This thesis investigates the potential of Mixture of Experts (MoE) models to enhance Smartscan's accuracy and efficiency by specializing sub-models for distinct document facets, including type, layout, and language; as well as trying to predict multiple fields from a single model.
In the first approach, the data is clustered into coherent subsets based on document characteristics, and dedicated expert subnetworks are trained on each cluster. The MoE architecture employs a dynamic routing mechanism that directs inference tasks to the most suitable expert(s), thereby facilitating the tailored handling of diverse input distributions. In the second technique, a single model has been trained with MoE to explore how features of the documents can be shared to predict multiple fields at once. By tackling domain-specific biases, improving scalability, and enabling dynamic adaptation to diverse input distributions, the proposed MoE approach aspires to demonstrate the viability of machine learning for optimizing document extraction workflows in production environments.
			
	Parola chiave
	
				Mixture of Experts
NLP
Deep Learning
Data Science
Document Extraction
			
	Relatore
	
				ZANUTTIGH, PIETRO
			
	Appare nelle tipologie:
	
				Lauree magistrali

File in questo prodotto:

File	Dimensione	Formato
DEste_Luca.pdf accesso aperto Dimensione 6.68 MB Formato Adobe PDF Visualizza/Apri	6.68 MB	Adobe PDF	Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/93334