The extraction of structured data from unstructured documents, such as invoices and receipts, represents a significant challenge in the automation of financial processes. Visma's Smartscan engine employs machine learning to deliver high-accuracy results, yet it encounters performance bottlenecks when addressing domain-specific variations across diverse customer datasets. This thesis investigates the potential of Mixture of Experts (MoE) models to enhance Smartscan's accuracy and efficiency by specializing sub-models for distinct document facets, including type, layout, and language; as well as trying to predict multiple fields from a single model. In the first approach, the data is clustered into coherent subsets based on document characteristics, and dedicated expert subnetworks are trained on each cluster. The MoE architecture employs a dynamic routing mechanism that directs inference tasks to the most suitable expert(s), thereby facilitating the tailored handling of diverse input distributions. In the second technique, a single model has been trained with MoE to explore how features of the documents can be shared to predict multiple fields at once. By tackling domain-specific biases, improving scalability, and enabling dynamic adaptation to diverse input distributions, the proposed MoE approach aspires to demonstrate the viability of machine learning for optimizing document extraction workflows in production environments.

The extraction of structured data from unstructured documents, such as invoices and receipts, represents a significant challenge in the automation of financial processes. Visma's Smartscan engine employs machine learning to deliver high-accuracy results, yet it encounters performance bottlenecks when addressing domain-specific variations across diverse customer datasets. This thesis investigates the potential of Mixture of Experts (MoE) models to enhance Smartscan's accuracy and efficiency by specializing sub-models for distinct document facets, including type, layout, and language; as well as trying to predict multiple fields from a single model. In the first approach, the data is clustered into coherent subsets based on document characteristics, and dedicated expert subnetworks are trained on each cluster. The MoE architecture employs a dynamic routing mechanism that directs inference tasks to the most suitable expert(s), thereby facilitating the tailored handling of diverse input distributions. In the second technique, a single model has been trained with MoE to explore how features of the documents can be shared to predict multiple fields at once. By tackling domain-specific biases, improving scalability, and enabling dynamic adaptation to diverse input distributions, the proposed MoE approach aspires to demonstrate the viability of machine learning for optimizing document extraction workflows in production environments.

Adaptive Document Extraction: A Mixture of Experts Approach for Visma’s Smartscan

D'ESTE, LUCA
2024/2025

Abstract

The extraction of structured data from unstructured documents, such as invoices and receipts, represents a significant challenge in the automation of financial processes. Visma's Smartscan engine employs machine learning to deliver high-accuracy results, yet it encounters performance bottlenecks when addressing domain-specific variations across diverse customer datasets. This thesis investigates the potential of Mixture of Experts (MoE) models to enhance Smartscan's accuracy and efficiency by specializing sub-models for distinct document facets, including type, layout, and language; as well as trying to predict multiple fields from a single model. In the first approach, the data is clustered into coherent subsets based on document characteristics, and dedicated expert subnetworks are trained on each cluster. The MoE architecture employs a dynamic routing mechanism that directs inference tasks to the most suitable expert(s), thereby facilitating the tailored handling of diverse input distributions. In the second technique, a single model has been trained with MoE to explore how features of the documents can be shared to predict multiple fields at once. By tackling domain-specific biases, improving scalability, and enabling dynamic adaptation to diverse input distributions, the proposed MoE approach aspires to demonstrate the viability of machine learning for optimizing document extraction workflows in production environments.
2024
Adaptive Document Extraction: A Mixture of Experts Approach for Visma’s Smartscan
The extraction of structured data from unstructured documents, such as invoices and receipts, represents a significant challenge in the automation of financial processes. Visma's Smartscan engine employs machine learning to deliver high-accuracy results, yet it encounters performance bottlenecks when addressing domain-specific variations across diverse customer datasets. This thesis investigates the potential of Mixture of Experts (MoE) models to enhance Smartscan's accuracy and efficiency by specializing sub-models for distinct document facets, including type, layout, and language; as well as trying to predict multiple fields from a single model. In the first approach, the data is clustered into coherent subsets based on document characteristics, and dedicated expert subnetworks are trained on each cluster. The MoE architecture employs a dynamic routing mechanism that directs inference tasks to the most suitable expert(s), thereby facilitating the tailored handling of diverse input distributions. In the second technique, a single model has been trained with MoE to explore how features of the documents can be shared to predict multiple fields at once. By tackling domain-specific biases, improving scalability, and enabling dynamic adaptation to diverse input distributions, the proposed MoE approach aspires to demonstrate the viability of machine learning for optimizing document extraction workflows in production environments.
Mixture of Experts
NLP
Deep Learning
Data Science
Document Extraction
File in questo prodotto:
File Dimensione Formato  
DEste_Luca.pdf

accesso aperto

Dimensione 6.68 MB
Formato Adobe PDF
6.68 MB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/93334