Digital Repository

Hybrid Privacy-Preserving Frameworks for Pattern Extraction in Sensitive Data: Balancing Utility and Confidentiality

Show simple item record

dc.contributor.author Hetti Arachchige, Yasitha Anuranga Jayasekara
dc.date.accessioned 2026-03-10T06:25:28Z
dc.date.available 2026-03-10T06:25:28Z
dc.date.issued 2025
dc.identifier.citation Hetti Arachchige, Yasitha Anuranga Jayasekara (2025) Hybrid Privacy-Preserving Frameworks for Pattern Extraction in Sensitive Data: Balancing Utility and Confidentiality. Msc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20200702
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2889
dc.description.abstract This work demonstrates how differential privacy (DP) can be integrated into diverse machine-learning methods while retaining practical utility on real-world tabular data. Using the Adult Income dataset, we compare baseline models with DP-enhanced versions of Decision Trees, K-Means, and Neural Networks. Trees adopt a noisy split-selection criterion under bounded features; K-Means uses noisy centroid updates with sensitivity constraints; and neural networks employ DP-SGD with gradient clipping, Gaussian noise, and privacy accounting. The modular pipeline separates preprocessing, training, evaluation, and reporting, standardizing features, encoding categories, and enforcing bounds. Baseline and DP models are trained under matched conditions, with grid searches over hyperparameters and privacy budgets (ε, δ). Evaluation spans utility metrics—accuracy for supervised tasks and silhouette scores for clustering—as well as privacy diagnostics, including membership-inference attacks (MIA), perturbation stability, and variability across repeated DP queries for aggregates such as means, variances, and counts. Privacy accounting reports cumulative ε and trade-offs across δ levels, with fixed seeds and logged hyperparameters ensuring reproducibility. Results show that DP consistently reduces MIA success and other overfitting indicators: model predictions remain stable, perturbations rarely flip outputs, and repeated queries exhibit controlled variability. Utility losses are modest and model-dependent: DP trees and neural networks experience small accuracy drops, and DP clustering maintains usable structure with limited silhouette reduction. DP aggregate queries remain close to true statistics at moderate ε values. Overall, the study provides a reusable end-to-end DP strategy, practical ε-selection guidance, and evidence that high-quality analytics and strong privacy guarantees can coexist. Limitations and future work include broader datasets, larger models, federated DP, and richer adversarial testing en_US
dc.language.iso en en_US
dc.subject Membership Inference Attacks en_US
dc.subject Differential Privacy en_US
dc.subject Machine learning en_US
dc.subject Privacy Budget en_US
dc.title Hybrid Privacy-Preserving Frameworks for Pattern Extraction in Sensitive Data: Balancing Utility and Confidentiality en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account