Transforming technical and functional specifications into ETL workflows is a complex and time consuming process that requires careful interpretation and manual coding. This research explores an AI-driven approach to automate and optimize ETL development by leveraging Natural Language Processing (NLP) and Generative AI. The proposed methodology focuses on extracting meaningful patterns from documentation and generating structured code templates to accelerate development while maintaining flexibility for manual refinement. By integrating AI-powered automation with human expertise, this approach aims to improve efficiency, reduce implementation time, and enhance consistency in ETL pipeline creation. The study evaluates its effectiveness through real world test cases and discusses its potential impact on data engineering workflows.
Automated pattern recognition and JSON extraction for ETL development
PASSARO, GIOVANNI
2025/2026
Abstract
Transforming technical and functional specifications into ETL workflows is a complex and time consuming process that requires careful interpretation and manual coding. This research explores an AI-driven approach to automate and optimize ETL development by leveraging Natural Language Processing (NLP) and Generative AI. The proposed methodology focuses on extracting meaningful patterns from documentation and generating structured code templates to accelerate development while maintaining flexibility for manual refinement. By integrating AI-powered automation with human expertise, this approach aims to improve efficiency, reduce implementation time, and enhance consistency in ETL pipeline creation. The study evaluates its effectiveness through real world test cases and discusses its potential impact on data engineering workflows.| File | Dimensione | Formato | |
|---|---|---|---|
|
Tesi Giovanni Passaro.pdf
accesso aperto
Dimensione
1.66 MB
Formato
Adobe PDF
|
1.66 MB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/108236