dc.description.abstract |
"This study dives into the complexities of fraud job detection in imbalanced datasets, a common
problem in machine learning (ML) applications. Using a large dataset of fake and genuine job
advertisements, the study examines the effectiveness of several machine learning algorithms
in detecting fraudulent behaviours, specifically in fraud detection of fake job advertisements.
ML classifiers such as Support Vector Machine (SVM), Random Forest (RF), Decision Tree
(DT), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Neural Networks (NN) are
rigorously examined, with a particular emphasis on performance criteria such as accuracy,
precision, recall, and F1 score. Furthermore, the study uses the Receiver Operating
Characteristic (ROC) curve analysis to assess the models' performance and capabilities. To
address the imbalance between fake and legitimate cases in the dataset, the study investigates
the use of oversampling strategies to reduce bias and improve the classifiers' prediction
capability. Through thorough research, Neural Networks emerge as the most promising
classifier with higher accuracy rates amidst class imbalance. Notably, the use of oversampling
approaches, such as the Synthetic Minority Over-sampling Technique (SMOTE) or the
Adaptive Synthetic Sampling Method (ADASYN), results in significant improvements in
classifier performance measures. Despite advances in detection accuracy, precision, recall, and
F1 score, the study recognises the limits of working with imbalanced datasets. Challenges
remain in ensuring optimal performance across all classes, especially when the minority class
is substantially underrepresented.
Furthermore, relying solely on traditional evaluation criteria such as accuracy and precision
may fail to convey the intricacies of classifier performance. To address these constraints, the
report proposes several alternatives for future research These include developing advanced
oversampling strategies, improving assessment metrics to reflect better model performance in
imbalanced classes, and trying out research on ensemble learning methodologies. By adopting
these alternatives, research methods will be able to mitigate the difficulties of imbalanced
datasets better, paving the way for future fraud detection systems that are more robust and
efficient." |
en_US |