| dc.description.abstract |
This work demonstrates how differential privacy (DP) can be integrated into diverse machine-learning methods while retaining practical utility on real-world tabular data. Using the Adult Income dataset, we compare baseline models with DP-enhanced versions of Decision Trees, K-Means, and Neural Networks. Trees adopt a noisy split-selection criterion under bounded features; K-Means uses noisy centroid updates with sensitivity constraints; and neural networks employ DP-SGD with gradient clipping, Gaussian noise, and privacy accounting. The modular pipeline separates preprocessing, training, evaluation, and reporting, standardizing features, encoding categories, and enforcing bounds. Baseline and DP models are trained under matched conditions, with grid searches over hyperparameters and privacy budgets (ε, δ).
Evaluation spans utility metrics—accuracy for supervised tasks and silhouette scores for clustering—as well as privacy diagnostics, including membership-inference attacks (MIA), perturbation stability, and variability across repeated DP queries for aggregates such as means, variances, and counts. Privacy accounting reports cumulative ε and trade-offs across δ levels, with fixed seeds and logged hyperparameters ensuring reproducibility.
Results show that DP consistently reduces MIA success and other overfitting indicators: model predictions remain stable, perturbations rarely flip outputs, and repeated queries exhibit controlled variability. Utility losses are modest and model-dependent: DP trees and neural networks experience small accuracy drops, and DP clustering maintains usable structure with limited silhouette reduction. DP aggregate queries remain close to true statistics at moderate ε values. Overall, the study provides a reusable end-to-end DP strategy, practical ε-selection guidance, and evidence that high-quality analytics and strong privacy guarantees can coexist. Limitations and future work include broader datasets, larger models, federated DP, and richer adversarial testing |
en_US |