Professional cooking environments involve complex equipment that is operated under time pressure and multitasking conditions, often by users with different levels of expertise. In this context, voice interaction offers a natural and efficient way to control advanced devices such as commercial ovens. This thesis presents the design and implementation of a voice assistant for smart ovens, developed for Unox S.p.A. The system leverages Large Language Models (LLMs) to overcome the limitations of traditional intent-based approaches, enabling more flexible and natural user interaction. Following an exploration of existing technologies and theoretical foundations, the work began by improving the existing Google Dialogflow-based assistant through automatic multilingual data generation, reducing manual effort while highlighting the structural constraints of intent-driven systems. To address these limitations, a multi-stage pipeline was developed, combining automatic speech recognition (ASR), an LLM-based agent, and a text-to-speech module (TTS). The agent interacts with the oven through a custom command-line interface, translating multilingual inputs into executable commands, handling multi-step operations, and managing errors. In addition to device control, the system supports knowledge-based queries about oven functionality and recipe-related reasoning. The system was evaluated using a custom dataset of scenarios covering multiple interaction categories, achieving a task success rate of 92.2%. Compared to the baseline system, the proposed solution supports a broader range of functionalities and natively enables multilingual interaction. The results demonstrate that LLM-based agentic systems can operate effectively within the constraints of an industrial platform. At the same time, the work highlights remaining challenges related to robustness, evaluation, and production readiness, suggesting directions for future development and real-world deployment.

An Agent-Driven Approach to Voice Interaction in Smart Commercial Ovens

BOSCOLO BACHETO, MARTINA
2025/2026

Abstract

Professional cooking environments involve complex equipment that is operated under time pressure and multitasking conditions, often by users with different levels of expertise. In this context, voice interaction offers a natural and efficient way to control advanced devices such as commercial ovens. This thesis presents the design and implementation of a voice assistant for smart ovens, developed for Unox S.p.A. The system leverages Large Language Models (LLMs) to overcome the limitations of traditional intent-based approaches, enabling more flexible and natural user interaction. Following an exploration of existing technologies and theoretical foundations, the work began by improving the existing Google Dialogflow-based assistant through automatic multilingual data generation, reducing manual effort while highlighting the structural constraints of intent-driven systems. To address these limitations, a multi-stage pipeline was developed, combining automatic speech recognition (ASR), an LLM-based agent, and a text-to-speech module (TTS). The agent interacts with the oven through a custom command-line interface, translating multilingual inputs into executable commands, handling multi-step operations, and managing errors. In addition to device control, the system supports knowledge-based queries about oven functionality and recipe-related reasoning. The system was evaluated using a custom dataset of scenarios covering multiple interaction categories, achieving a task success rate of 92.2%. Compared to the baseline system, the proposed solution supports a broader range of functionalities and natively enables multilingual interaction. The results demonstrate that LLM-based agentic systems can operate effectively within the constraints of an industrial platform. At the same time, the work highlights remaining challenges related to robustness, evaluation, and production readiness, suggesting directions for future development and real-world deployment.
2025
An Agent-Driven Approach to Voice Interaction in Smart Commercial Ovens
Agentic AI
LLM
Voice Interaction
Conversational AI
AI Agents
File in questo prodotto:
File Dimensione Formato  
Boscolo_Bacheto_Martina.pdf

Accesso riservato

Dimensione 4.32 MB
Formato Adobe PDF
4.32 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/106269