A Comprehensive Review of using Graph Neural Networks  (GNNs) in Phishing Email Detection

Sarathchandra, Chanuthi

A Comprehensive Review of using Graph Neural Networks (GNNs) in Phishing Email Detection

Sarathchandra, Chanuthi

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/3042

Date: 2025

Abstract:

Problem: Email is a popular target for attackers due to the huge number of emails that are exchanged every day. Phishing is becoming an increasingly useful tactic to take advantage of people's increased reliance on digital communication for financial, business, and personal relationships. Phishing attacks continue to evolve, becoming harder to detect. The extensive usage of cloud services and the rise in remote work because of the corona virus pandemic, have increased the likelihood of phishing attacks. Outside of protected office settings, employees are frequently more susceptible to phishing attacks, and using remote communication tools increases the potential to receive phishing attacks. Methodology: A Graph Neural Network (GNN) model was developed to detect phishing emails. Phishing emails and their associated data (such as email headers, links, sender information, and network patterns) often form complex relationships that can be naturally represented as graphs. GNNs are well-suited for modeling such interconnected data where nodes represent entities (e.g., sender email, recipient, URL) and edges represent relationships (e.g., sender-receiver interaction, URL-website link). This makes it possible for GNNs to identify the connections between different features of phishing emails. Phishing emails and the infrastructure that supports them are constantly changing. Attackers often switch up their strategies by using new senders, IPs, or domains. Because GNNs can propagate new information across the graph and update relationships continuously, they are flexible in managing such evolving data. Because of this, they are especially useful for identifying new or constantly evolving phishing patterns that static detection techniques can miss. Initial Results: Initial evaluation of the Proof of Concept (PoC) achieved an accuracy rate of 83.33%, with a confusion matrix revealing a 33.33% false-positive rate and a 16.67% false negative rate. The model's performance demonstrates its ability to correctly classify phishing and legitimate emails, with 3 true positives, 1 false positive, 1 false negative, and 2 true negatives.

Show full item record