Machine Learning Based Fraud Detection In Health Insurance

Dharmavijaya, Thilanka

dc.contributor.author	Dharmavijaya, Thilanka
dc.date.accessioned	2025-07-01T03:23:34Z
dc.date.available	2025-07-01T03:23:34Z
dc.date.issued	2024
dc.identifier.citation	Dharmavijaya, Thilanka (2024) Machine Learning Based Fraud Detection In Health Insurance. MSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	2018561
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2796
dc.description.abstract	"Insurance fraud is a growing problem in the economic sector, distinguished by its constantly changing nature and growing complexity. As fraudsters continue to develop new strategies, conventional methods of detection are proving insufficient, requiring the implementation of more efficient and creative alternatives. This study enhances the precision and adaptability of insurance fraud detection by applying an approach based on machine learning. The study suggests a new model that takes advantage of the natural data imbalance in fraud detection scenarios. This model not only improves the rates of identifying fraud but also offers valuable insights into how fraud processes work. This study aims to address the current limitations in technique and make a significant theoretical and practical contribution to the field of fraud prevention by combining domain-specific knowledge and utilizing advanced algorithms. The main objective of this study is to employ machine learning techniques for the purpose of detecting fraudulent activities in the insurance industry. At first, an Exploratory Data Analysis (EDA) was performed to get insight into the properties of the dataset. In order to tackle the issue of imbalanced data, where the number of fraudulent claims is substantially lower than legitimate ones, oversampling techniques were utilized to properly balance the dataset. After performing data preprocessing, process included transforming categorical variables into one-hot encoding. Additionally, the dataset was split into distinct training and testing sets. Subsequently, a range of machine learning techniques were employed, such as Random Forest, Decision Tree, and Support Vector Machine (SVM). Out of all them, the highest level of accuracy achieved was 0.87. The study improved models' performance by fine-tuning hyperparameters, addressing data imbalance and reducing fraudulent claims compared to authentic ones."	en_US
dc.language.iso	en	en_US
dc.subject	Insurance Fraud Detection	en_US
dc.subject	Machine Learning Techniques	en_US
dc.subject	Anomaly Detection	en_US
dc.title	Machine Learning Based Fraud Detection In Health Insurance	en_US
dc.type	Thesis	en_US