Digital Repository

PRIME-DD: Prioritized Representation Integration through a Multi-phase Hybrid Distillation of Importance Weights and Trajectory Signals for Scalable Dataset Compression

Show simple item record

dc.contributor.author Fernando, Wiras
dc.date.accessioned 2026-03-26T07:32:12Z
dc.date.available 2026-03-26T07:32:12Z
dc.date.issued 2025
dc.identifier.citation Fernando, Wiras (2025) PRIME-DD: Prioritized Representation Integration through a Multi-phase Hybrid Distillation of Importance Weights and Trajectory Signals for Scalable Dataset Compression. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20200538
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/3074
dc.description.abstract The exponential growth of data in machine learning has introduced substantial challenges in terms of computational cost, storage, and model training efficiency particularly for large-scale and resource-intensive applications. Dataset distillation, which aims to synthesize smaller, representative datasets without sacrificing model performance, has emerged as a promising solution. However, existing techniques often suffer from poor scalability, lack of fairness, and limited generalization across diverse model architectures. This research proposes a novel hybrid framework PRIME-DD (Prioritized Representation Integration through a Multi-phase Hybrid Distillation of Importance Weights and Trajectory Signals for Scalable Dataset Compression) that integrates Importance-Aware Adaptive Distillation (IADD) with Trajectory Matching (TM) to address these limitations. The proposed system dynamically identifies and retains high-value data points based on uncertainty, misclassification, and class balance, while aligning the training trajectories of student models to that of the teacher for improved convergence. A fully functional prototype was developed using PyTorch and React, with real-time distillation feedback, synthetic sample visualization, and historical benchmarking support. The framework was evaluated on benchmark datasets such as CIFAR-10, CIFAR-100, SVHN, MNIST and FashionMNIST, and tested across multiple architectures to validate generalization and robustness. Experimental results demonstrate significant improvements in dataset compactness, training time, and cross-architecture accuracy, while also incorporating mechanisms to reduce sampling bias and improve fairness. This research not only advances the state of dataset distillation but also contributes a scalable, explainable, and ethically sound solution suitable for real-world machine learning systems. en_US
dc.language.iso en en_US
dc.subject Dataset en_US
dc.subject Scalability en_US
dc.subject Efficient en_US
dc.subject Machine Learning en_US
dc.title PRIME-DD: Prioritized Representation Integration through a Multi-phase Hybrid Distillation of Importance Weights and Trajectory Signals for Scalable Dataset Compression en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account