Artificial intelligence is increasingly used in domains requiring trustworthy data, transparency, and auditability. The European Union promotes federated sector-specific Data Spaces to enable sovereign and interoperable data sharing, but integrating artificial intelligence workflows in such environments raises challenges related to data integrity, provenance, and accountability. Blockchain technology has been proposed as a mechanism to strengthen trust in data-driven systems due to its immutability and tamper-evident properties, yet empirical studies evaluating its computational feasibility within practical machine-learning pipelines remain limited. This thesis develops a conceptual framework connecting Data Space trust requirements with blockchain’s provenance capabilities, and implements a reproducible experiment in Google Colab. A synthetic but domain-plausible regenerative-agriculture dataset is combined with several regression models and a lightweight blockchain module that logs hashed artefacts from each step of the pipeline. The results show that blockchain provenance introduces negligible computational overhead, preserves predictive performance, and reliably detects tampering of raw data, preprocessed features, model artefacts, and evaluation metrics. The findings suggest that blockchain-backed provenance can enhance trust and accountability in artificial intelligence workflows deployed in federated data ecosystems such as regenerative-agriculture Data Spaces. The thesis concludes with a discussion of limitations, including the problem of ensuring data truthfulness at the moment of ingestion, and outlines directions for integrating trusted hardware, multi-source validation, and distributed ledger technologies within real-world Data Space deployments.
Artificial intelligence is increasingly used in domains requiring trustworthy data, transparency, and auditability. The European Union promotes federated sector-specific Data Spaces to enable sovereign and interoperable data sharing, but integrating artificial intelligence workflows in such environments raises challenges related to data integrity, provenance, and accountability. Blockchain technology has been proposed as a mechanism to strengthen trust in data-driven systems due to its immutability and tamper-evident properties, yet empirical studies evaluating its computational feasibility within practical machine-learning pipelines remain limited. This thesis develops a conceptual framework connecting Data Space trust requirements with blockchain’s provenance capabilities, and implements a reproducible experiment in Google Colab. A synthetic but domain-plausible regenerative-agriculture dataset is combined with several regression models and a lightweight blockchain module that logs hashed artefacts from each step of the pipeline. The results show that blockchain provenance introduces negligible computational overhead, preserves predictive performance, and reliably detects tampering of raw data, preprocessed features, model artefacts, and evaluation metrics. The findings suggest that blockchain-backed provenance can enhance trust and accountability in artificial intelligence workflows deployed in federated data ecosystems such as regenerative-agriculture Data Spaces. The thesis concludes with a discussion of limitations, including the problem of ensuring data truthfulness at the moment of ingestion, and outlines directions for integrating trusted hardware, multi-source validation, and distributed ledger technologies within real-world Data Space deployments.
Data Spaces: Building Trust in AI with Blockchain
DAL MAS, GIOVANNI
2024/2025
Abstract
Artificial intelligence is increasingly used in domains requiring trustworthy data, transparency, and auditability. The European Union promotes federated sector-specific Data Spaces to enable sovereign and interoperable data sharing, but integrating artificial intelligence workflows in such environments raises challenges related to data integrity, provenance, and accountability. Blockchain technology has been proposed as a mechanism to strengthen trust in data-driven systems due to its immutability and tamper-evident properties, yet empirical studies evaluating its computational feasibility within practical machine-learning pipelines remain limited. This thesis develops a conceptual framework connecting Data Space trust requirements with blockchain’s provenance capabilities, and implements a reproducible experiment in Google Colab. A synthetic but domain-plausible regenerative-agriculture dataset is combined with several regression models and a lightweight blockchain module that logs hashed artefacts from each step of the pipeline. The results show that blockchain provenance introduces negligible computational overhead, preserves predictive performance, and reliably detects tampering of raw data, preprocessed features, model artefacts, and evaluation metrics. The findings suggest that blockchain-backed provenance can enhance trust and accountability in artificial intelligence workflows deployed in federated data ecosystems such as regenerative-agriculture Data Spaces. The thesis concludes with a discussion of limitations, including the problem of ensuring data truthfulness at the moment of ingestion, and outlines directions for integrating trusted hardware, multi-source validation, and distributed ledger technologies within real-world Data Space deployments.| File | Dimensione | Formato | |
|---|---|---|---|
|
DalMas_Giovanni.pdf
accesso aperto
Dimensione
939.42 kB
Formato
Adobe PDF
|
939.42 kB | Adobe PDF | Visualizza/Apri |
The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License
https://hdl.handle.net/20.500.12608/102104