The large-scale retail sector is characterized by increasingly competitive pressure, heightened by the evolution of consumer habits and the digitalization of the market. In this scenario, the ability to formulate data-driven strategies is no longer an option, but essential for maintaining and increasing market share. The main research problem addressed in this thesis concerns the so-called strategic "information gap". Information relating to competitors' promotional policies, although of significant strategic value, is mainly conveyed through unstructured channels such as promotional flyers, distributed in digital formats such as PDFs or images. The absence of a systematic and automated process to extract, aggregate and analyze this information deprives business decision-makers (in the pricing, marketing and category management areas) of a fundamental asset for strategic planning. The primary objective of this thesis is, therefore, the design, development and validation of a complete (end-to-end) Business Intelligence (BI) pipeline. This pipeline aims to automate the entire process: from the collection of promotional flyers from the main players in the large-scale retail trade, to the structured extraction of key information (product, price, format, discount) through Optical Character Recognition (OCR) and Natural Language Processing (NLP) technologies, up to the loading of this data into an Enterprise Planning Management (EPM) platform such as Board International. The originality and contribution of this work lie in the integration of heterogeneous technologies (web scraping, computer vision, OCR, BI) to solve a real problem and gain a concrete advantage.
The large-scale retail sector is characterized by increasingly competitive pressure, heightened by the evolution of consumer habits and the digitalization of the market. In this scenario, the ability to formulate data-driven strategies is no longer an option, but essential for maintaining and increasing market share. The main research problem addressed in this thesis concerns the so-called strategic "information gap". Information relating to competitors' promotional policies, although of significant strategic value, is mainly conveyed through unstructured channels such as promotional flyers, distributed in digital formats such as PDFs or images. The absence of a systematic and automated process to extract, aggregate and analyze this information deprives business decision-makers (in the pricing, marketing and category management areas) of a fundamental asset for strategic planning. The primary objective of this thesis is, therefore, the design, development and validation of a complete (end-to-end) Business Intelligence (BI) pipeline. This pipeline aims to automate the entire process: from the collection of promotional flyers from the main players in the large-scale retail trade, to the structured extraction of key information (product, price, format, discount) through Optical Character Recognition (OCR) and Natural Language Processing (NLP) technologies, up to the loading of this data into an Enterprise Planning Management (EPM) platform such as Board International. The originality and contribution of this work lie in the integration of heterogeneous technologies (web scraping, computer vision, OCR, BI) to solve a real problem and gain a concrete advantage.
Design and Development of a Business Intelligence Pipeline for Retail Promotion Analysis
FAVALLI, GIANMARCO
2024/2025
Abstract
The large-scale retail sector is characterized by increasingly competitive pressure, heightened by the evolution of consumer habits and the digitalization of the market. In this scenario, the ability to formulate data-driven strategies is no longer an option, but essential for maintaining and increasing market share. The main research problem addressed in this thesis concerns the so-called strategic "information gap". Information relating to competitors' promotional policies, although of significant strategic value, is mainly conveyed through unstructured channels such as promotional flyers, distributed in digital formats such as PDFs or images. The absence of a systematic and automated process to extract, aggregate and analyze this information deprives business decision-makers (in the pricing, marketing and category management areas) of a fundamental asset for strategic planning. The primary objective of this thesis is, therefore, the design, development and validation of a complete (end-to-end) Business Intelligence (BI) pipeline. This pipeline aims to automate the entire process: from the collection of promotional flyers from the main players in the large-scale retail trade, to the structured extraction of key information (product, price, format, discount) through Optical Character Recognition (OCR) and Natural Language Processing (NLP) technologies, up to the loading of this data into an Enterprise Planning Management (EPM) platform such as Board International. The originality and contribution of this work lie in the integration of heterogeneous technologies (web scraping, computer vision, OCR, BI) to solve a real problem and gain a concrete advantage.| File | Dimensione | Formato | |
|---|---|---|---|
|
TESI_FavalliV04.pdf
Accesso riservato
Dimensione
2.42 MB
Formato
Adobe PDF
|
2.42 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/99736