Digital Repository

Malicious URL detection based on machine learning.

Show simple item record

dc.contributor.author Udayasanthiran, Ahtshayan
dc.date.accessioned 2024-04-30T09:01:02Z
dc.date.available 2024-04-30T09:01:02Z
dc.date.issued 2023
dc.identifier.citation Udayasanthiran, Ahtshayan (2023) Malicious URL detection based on machine learning.. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 2019359
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2107
dc.description.abstract "The rise in internet usage has coincided with an increase in cyber dangers, notably the prevalent issue of rogue URLs. These URLs are frequently used in phishing scams, malware distribution, and other types of criminality. Because of the quickly developing nature of these threats and the difficulty to scale efficiently to address the vast amount of URLs created everyday, current approaches for identifying malicious URLs, such as blacklisting or rule-based systems, have proven ineffective. As a result, there is an urgent need for a more effective, precise, and scalable method of detecting and neutralizing the dangers posed by bad URLs. In answer to this issue, the author has developed a sophisticated machine learning model based on Logistic Regression and TfidfVectorizer. To categorize URLs as benign or dangerous, Logistic Regression, a machine learning approach generally used for binary classification issues, was applied. TfidfVectorizer, a feature extraction method that turns text data into numerical vectors, was utilized, on the other hand, to convert the URLs into a format acceptable for the Logistic Regression model. This approach provides a score to each token in the URL, depending on its frequency in the URL and rarity in the total dataset. The model was trained on a huge dataset of URLs that had been categorized as benign or dangerous. The model's performance was evaluated using essential data science metrics for binary classification tasks, such as accuracy, precision, recall, and the F1 score. Testing was carried out on a different dataset from the training set. The model demonstrated good accuracy, indicating its ability to accurately categorize URLs. Precision was also high, indicating a low proportion of false positives, and the model's ability to catch the bulk of dangerous URLs was supported by a high recall score. The F1 score, which is a harmonic mean of accuracy and recall, attested to the model's solid performance even further. This novel way to detecting fraudulent URLs marks a big leap in the world of cybersecurity." en_US
dc.language.iso en en_US
dc.subject Uniform Resource Locator en_US
dc.subject Malware en_US
dc.subject Random Access Memory en_US
dc.title Malicious URL detection based on machine learning. en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account