This thesis presents the design, implementation, and production validation of a modern Change Data Capture pipeline at InfoCamere, the digital innovation company of the Italian Chambers of Commerce. The primary objective of the project was to modernize the data extraction process of the Italian Business Register, migrating from a legacy, synchronous trigger-based system to an asynchronous, log-based streaming architecture. The previous approach inherently caused write amplification, increased transaction latency, and consumed critical resources on the primary Oracle database. To resolve these structural inefficiencies, a new event-driven ecosystem was engineered. The architecture leverages Debezium to passively extract data modifications directly from Oracle’s native Redo Logs, entirely decoupling the capture process from production workloads. The extracted events are then ingested into a fault-tolerant Apache Kafka infrastructure, deployed across Confluent and Cloudera environments, while Apache NiFi is employed to orchestrate, filter, and route the data streams in real-time. The solution was successfully deployed and stress-tested in a production environment on highly transactional domains,seamlessly processing large volumes of transactional events without causing any performance degradation. Ultimately, this migration eliminates database bottlenecks and establishes a robust, scalable foundation for future real-time enterprise initiatives.
This thesis presents the design, implementation, and production validation of a modern Change Data Capture pipeline at InfoCamere, the digital innovation company of the Italian Chambers of Commerce. The primary objective of the project was to modernize the data extraction process of the Italian Business Register, migrating from a legacy, synchronous trigger-based system to an asynchronous, log-based streaming architecture. The previous approach inherently caused write amplification, increased transaction latency, and consumed critical resources on the primary Oracle database. To resolve these structural inefficiencies, a new event-driven ecosystem was engineered. The architecture leverages Debezium to passively extract data modifications directly from Oracle’s native Redo Logs, entirely decoupling the capture process from production workloads. The extracted events are then ingested into a fault-tolerant Apache Kafka infrastructure, deployed across Confluent and Cloudera environments, while Apache NiFi is employed to orchestrate, filter, and route the data streams in real-time. The solution was successfully deployed and stress-tested in a production environment on highly transactional domains,seamlessly processing large volumes of transactional events without causing any performance degradation. Ultimately, this migration eliminates database bottlenecks and establishes a robust, scalable foundation for future real-time enterprise initiatives.
Evolution of Change Data Capture: A Transition from Trigger-based to Log-based Systems using Apache Kafka
SETTIMO, LAURA
2025/2026
Abstract
This thesis presents the design, implementation, and production validation of a modern Change Data Capture pipeline at InfoCamere, the digital innovation company of the Italian Chambers of Commerce. The primary objective of the project was to modernize the data extraction process of the Italian Business Register, migrating from a legacy, synchronous trigger-based system to an asynchronous, log-based streaming architecture. The previous approach inherently caused write amplification, increased transaction latency, and consumed critical resources on the primary Oracle database. To resolve these structural inefficiencies, a new event-driven ecosystem was engineered. The architecture leverages Debezium to passively extract data modifications directly from Oracle’s native Redo Logs, entirely decoupling the capture process from production workloads. The extracted events are then ingested into a fault-tolerant Apache Kafka infrastructure, deployed across Confluent and Cloudera environments, while Apache NiFi is employed to orchestrate, filter, and route the data streams in real-time. The solution was successfully deployed and stress-tested in a production environment on highly transactional domains,seamlessly processing large volumes of transactional events without causing any performance degradation. Ultimately, this migration eliminates database bottlenecks and establishes a robust, scalable foundation for future real-time enterprise initiatives.| File | Dimensione | Formato | |
|---|---|---|---|
|
Settimo_Laura.pdf
Accesso riservato
Dimensione
2.38 MB
Formato
Adobe PDF
|
2.38 MB | Adobe PDF |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/106860