Abstract:
Cooperative data-sharing systems are essential in domains that manage sensitive information, such as healthcare, finance, and supply chain operations. These sectors frequently suffer from data-quality issues, particularly when information is contributed by multiple independent sources. Poor or inconsistent data can lead to unreliable machine-learning (ML) predictions and raise ethical concerns regarding fairness, accuracy, and trustworthiness. While decentralized technologies like blockchain offer transparency, security, and immutability—features valuable for handling sensitive data—they lack inherent mechanisms to verify the quality of the data stored.
This research aims to address these challenges by integrating blockchain smart contracts with machine-learning techniques to create a framework that ensures high-quality data for ML model training.
The study adopts the PureChain architecture, a blockchain-supported ML platform designed to enhance data reliability. Data contributors upload raw information to a decentralized system where smart-contract rules automatically evaluate and enforce data-quality standards. The methodology consists of three main stages: data preprocessing, implementation of smart-contract validation rules, and training and validating ML models. In PureChain, smart contracts serve a dual function: safeguarding data privacy and preventing the submission of low-quality or malicious inputs. Machine-learning operators subsequently assess the performance of models trained exclusively on validated, high-quality datasets. A prototype implementation is used to evaluate the feasibility, scalability, efficiency, and effectiveness of this decentralized quality-assurance approach.