This thesis investigates how retrieval-augmented generation can enhance domain-specific machine translation without the need for full model fine-tuning. The proposed system integrates retrieved examples into language model prompts, allowing for more accurate and context-aware translations of structured and technical content. A supporting data processing pipeline is developed to handle noisy bilingual datasets and enable incremental updates, ensuring the system remains adaptable and scalable. Preliminary results suggest improvements in translation quality and consistency, with evaluation conducted through a combination of established automated metrics and expert human assessment.

This thesis investigates how retrieval-augmented generation can enhance domain-specific machine translation without the need for full model fine-tuning. The proposed system integrates retrieved examples into language model prompts, allowing for more accurate and context-aware translations of structured and technical content. A supporting data processing pipeline is developed to handle noisy bilingual datasets and enable incremental updates, ensuring the system remains adaptable and scalable. Preliminary results suggest improvements in translation quality and consistency, with evaluation conducted through a combination of established automated metrics and expert human assessment.

Enhancing Domain-Specific Machine Translation with Retrieval-Augmented Generation (RAG)

SHEIKHI, SAHAR
2024/2025

Abstract

This thesis investigates how retrieval-augmented generation can enhance domain-specific machine translation without the need for full model fine-tuning. The proposed system integrates retrieved examples into language model prompts, allowing for more accurate and context-aware translations of structured and technical content. A supporting data processing pipeline is developed to handle noisy bilingual datasets and enable incremental updates, ensuring the system remains adaptable and scalable. Preliminary results suggest improvements in translation quality and consistency, with evaluation conducted through a combination of established automated metrics and expert human assessment.
2024
Enhancing Domain-Specific Machine Translation with Retrieval-Augmented Generation (RAG)
This thesis investigates how retrieval-augmented generation can enhance domain-specific machine translation without the need for full model fine-tuning. The proposed system integrates retrieved examples into language model prompts, allowing for more accurate and context-aware translations of structured and technical content. A supporting data processing pipeline is developed to handle noisy bilingual datasets and enable incremental updates, ensuring the system remains adaptable and scalable. Preliminary results suggest improvements in translation quality and consistency, with evaluation conducted through a combination of established automated metrics and expert human assessment.
Machine Translation
RAG
Domain Adaptation
Language Models
File in questo prodotto:
File Dimensione Formato  
final-thesis.pdf

Accesso riservato

Dimensione 1.89 MB
Formato Adobe PDF
1.89 MB Adobe PDF

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/102092