Abstract:
Class imbalance is a major challenge in machine learning, especially in tabular datasets where
critical minority classes are often underrepresented. This imbalance can lead to biased models that
disproportionately favor majority classes, compromising predictive accuracy and reliability,
particularly in scenarios where accurate minority class detection is essential. To address this issue,
this thesis presents Harmony AI, an automated and adaptive system that intelligently identifies
class imbalance patterns and recommends the most suitable resampling techniques based on
dataset characteristics. Leveraging a trained meta-learning model and domain-agnostic feature
engineering, Harmony AI dynamically selects from techniques such as SMOTE, random
undersampling, and others to optimize performance. The system is designed for ease of use,
enabling users with little to no machine learning expertise to upload, process, and export balanced
datasets through a user-friendly interface. Extensive testing across diverse tabular datasets
demonstrates that Harmony AI consistently improves recall while maintaining high precision,
outperforming manual resampling methods in most cases. The proposed framework contributes to
automated machine learning (AutoML) by offering a general, efficient, and interpretable solution
to class imbalance in real-world data.