Artificial intelligence is increasingly used in domains requiring trustworthy data, transparency, and auditability. The European Union promotes federated sector-specific Data Spaces to enable sovereign and interoperable data sharing, but integrating artificial intelligence workflows in such environments raises challenges related to data integrity, provenance, and accountability. Blockchain technology has been proposed as a mechanism to strengthen trust in data-driven systems due to its immutability and tamper-evident properties, yet empirical studies evaluating its computational feasibility within practical machine-learning pipelines remain limited. This thesis develops a conceptual framework connecting Data Space trust requirements with blockchain’s provenance capabilities, and implements a reproducible experiment in Google Colab. A synthetic but domain-plausible regenerative-agriculture dataset is combined with several regression models and a lightweight blockchain module that logs hashed artefacts from each step of the pipeline. The results show that blockchain provenance introduces negligible computational overhead, preserves predictive performance, and reliably detects tampering of raw data, preprocessed features, model artefacts, and evaluation metrics. The findings suggest that blockchain-backed provenance can enhance trust and accountability in artificial intelligence workflows deployed in federated data ecosystems such as regenerative-agriculture Data Spaces. The thesis concludes with a discussion of limitations, including the problem of ensuring data truthfulness at the moment of ingestion, and outlines directions for integrating trusted hardware, multi-source validation, and distributed ledger technologies within real-world Data Space deployments.

Artificial intelligence is increasingly used in domains requiring trustworthy data, transparency, and auditability. The European Union promotes federated sector-specific Data Spaces to enable sovereign and interoperable data sharing, but integrating artificial intelligence workflows in such environments raises challenges related to data integrity, provenance, and accountability. Blockchain technology has been proposed as a mechanism to strengthen trust in data-driven systems due to its immutability and tamper-evident properties, yet empirical studies evaluating its computational feasibility within practical machine-learning pipelines remain limited. This thesis develops a conceptual framework connecting Data Space trust requirements with blockchain’s provenance capabilities, and implements a reproducible experiment in Google Colab. A synthetic but domain-plausible regenerative-agriculture dataset is combined with several regression models and a lightweight blockchain module that logs hashed artefacts from each step of the pipeline. The results show that blockchain provenance introduces negligible computational overhead, preserves predictive performance, and reliably detects tampering of raw data, preprocessed features, model artefacts, and evaluation metrics. The findings suggest that blockchain-backed provenance can enhance trust and accountability in artificial intelligence workflows deployed in federated data ecosystems such as regenerative-agriculture Data Spaces. The thesis concludes with a discussion of limitations, including the problem of ensuring data truthfulness at the moment of ingestion, and outlines directions for integrating trusted hardware, multi-source validation, and distributed ledger technologies within real-world Data Space deployments.

Data Spaces: Building Trust in AI with Blockchain

DAL MAS, GIOVANNI
2024/2025

Abstract

Artificial intelligence is increasingly used in domains requiring trustworthy data, transparency, and auditability. The European Union promotes federated sector-specific Data Spaces to enable sovereign and interoperable data sharing, but integrating artificial intelligence workflows in such environments raises challenges related to data integrity, provenance, and accountability. Blockchain technology has been proposed as a mechanism to strengthen trust in data-driven systems due to its immutability and tamper-evident properties, yet empirical studies evaluating its computational feasibility within practical machine-learning pipelines remain limited. This thesis develops a conceptual framework connecting Data Space trust requirements with blockchain’s provenance capabilities, and implements a reproducible experiment in Google Colab. A synthetic but domain-plausible regenerative-agriculture dataset is combined with several regression models and a lightweight blockchain module that logs hashed artefacts from each step of the pipeline. The results show that blockchain provenance introduces negligible computational overhead, preserves predictive performance, and reliably detects tampering of raw data, preprocessed features, model artefacts, and evaluation metrics. The findings suggest that blockchain-backed provenance can enhance trust and accountability in artificial intelligence workflows deployed in federated data ecosystems such as regenerative-agriculture Data Spaces. The thesis concludes with a discussion of limitations, including the problem of ensuring data truthfulness at the moment of ingestion, and outlines directions for integrating trusted hardware, multi-source validation, and distributed ledger technologies within real-world Data Space deployments.
2024
Data Spaces: Building Trust in AI with Blockchain
Artificial intelligence is increasingly used in domains requiring trustworthy data, transparency, and auditability. The European Union promotes federated sector-specific Data Spaces to enable sovereign and interoperable data sharing, but integrating artificial intelligence workflows in such environments raises challenges related to data integrity, provenance, and accountability. Blockchain technology has been proposed as a mechanism to strengthen trust in data-driven systems due to its immutability and tamper-evident properties, yet empirical studies evaluating its computational feasibility within practical machine-learning pipelines remain limited. This thesis develops a conceptual framework connecting Data Space trust requirements with blockchain’s provenance capabilities, and implements a reproducible experiment in Google Colab. A synthetic but domain-plausible regenerative-agriculture dataset is combined with several regression models and a lightweight blockchain module that logs hashed artefacts from each step of the pipeline. The results show that blockchain provenance introduces negligible computational overhead, preserves predictive performance, and reliably detects tampering of raw data, preprocessed features, model artefacts, and evaluation metrics. The findings suggest that blockchain-backed provenance can enhance trust and accountability in artificial intelligence workflows deployed in federated data ecosystems such as regenerative-agriculture Data Spaces. The thesis concludes with a discussion of limitations, including the problem of ensuring data truthfulness at the moment of ingestion, and outlines directions for integrating trusted hardware, multi-source validation, and distributed ledger technologies within real-world Data Space deployments.
Data Spaces
Blockchain
Trust
Decentralization
Web3
File in questo prodotto:
File Dimensione Formato  
DalMas_Giovanni.pdf

accesso aperto

Dimensione 939.42 kB
Formato Adobe PDF
939.42 kB Adobe PDF Visualizza/Apri

The text of this website © Università degli studi di Padova. Full Text are published under a non-exclusive license. Metadata are under a CC0 License

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12608/102104