PRIME-DD: Prioritized Representation Integration through a Multi-phase Hybrid Distillation of Importance Weights and Trajectory Signals for Scalable Dataset Compression

Fernando, Wiras

dc.contributor.author	Fernando, Wiras
dc.date.accessioned	2026-03-26T07:32:12Z
dc.date.available	2026-03-26T07:32:12Z
dc.date.issued	2025
dc.identifier.citation	Fernando, Wiras (2025) PRIME-DD: Prioritized Representation Integration through a Multi-phase Hybrid Distillation of Importance Weights and Trajectory Signals for Scalable Dataset Compression. BSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20200538
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/3074
dc.description.abstract	The exponential growth of data in machine learning has introduced substantial challenges in terms of computational cost, storage, and model training efficiency particularly for large-scale and resource-intensive applications. Dataset distillation, which aims to synthesize smaller, representative datasets without sacrificing model performance, has emerged as a promising solution. However, existing techniques often suffer from poor scalability, lack of fairness, and limited generalization across diverse model architectures. This research proposes a novel hybrid framework PRIME-DD (Prioritized Representation Integration through a Multi-phase Hybrid Distillation of Importance Weights and Trajectory Signals for Scalable Dataset Compression) that integrates Importance-Aware Adaptive Distillation (IADD) with Trajectory Matching (TM) to address these limitations. The proposed system dynamically identifies and retains high-value data points based on uncertainty, misclassification, and class balance, while aligning the training trajectories of student models to that of the teacher for improved convergence. A fully functional prototype was developed using PyTorch and React, with real-time distillation feedback, synthetic sample visualization, and historical benchmarking support. The framework was evaluated on benchmark datasets such as CIFAR-10, CIFAR-100, SVHN, MNIST and FashionMNIST, and tested across multiple architectures to validate generalization and robustness. Experimental results demonstrate significant improvements in dataset compactness, training time, and cross-architecture accuracy, while also incorporating mechanisms to reduce sampling bias and improve fairness. This research not only advances the state of dataset distillation but also contributes a scalable, explainable, and ethically sound solution suitable for real-world machine learning systems.	en_US
dc.language.iso	en	en_US
dc.subject	Dataset	en_US
dc.subject	Scalability	en_US
dc.subject	Efficient	en_US
dc.subject	Machine Learning	en_US
dc.title	PRIME-DD: Prioritized Representation Integration through a Multi-phase Hybrid Distillation of Importance Weights and Trajectory Signals for Scalable Dataset Compression	en_US
dc.type	Thesis	en_US