Abstract:
Problem: Email is a popular target for attackers due to the huge number of emails that are
exchanged every day. Phishing is becoming an increasingly useful tactic to take advantage of
people's increased reliance on digital communication for financial, business, and personal
relationships. Phishing attacks continue to evolve, becoming harder to detect. The extensive
usage of cloud services and the rise in remote work because of the corona virus pandemic, have
increased the likelihood of phishing attacks. Outside of protected office settings, employees are
frequently more susceptible to phishing attacks, and using remote communication tools increases
the potential to receive phishing attacks.
Methodology: A Graph Neural Network (GNN) model was developed to detect phishing emails.
Phishing emails and their associated data (such as email headers, links, sender information, and
network patterns) often form complex relationships that can be naturally represented as graphs.
GNNs are well-suited for modeling such interconnected data where nodes represent entities (e.g.,
sender email, recipient, URL) and edges represent relationships (e.g., sender-receiver interaction,
URL-website link). This makes it possible for GNNs to identify the connections between
different features of phishing emails. Phishing emails and the infrastructure that supports them
are constantly changing. Attackers often switch up their strategies by using new senders, IPs, or
domains. Because GNNs can propagate new information across the graph and update
relationships continuously, they are flexible in managing such evolving data. Because of this,
they are especially useful for identifying new or constantly evolving phishing patterns that static
detection techniques can miss.
Initial Results: Initial evaluation of the Proof of Concept (PoC) achieved an accuracy rate of
83.33%, with a confusion matrix revealing a 33.33% false-positive rate and a 16.67% false
negative rate. The model's performance demonstrates its ability to correctly classify phishing and
legitimate emails, with 3 true positives, 1 false positive, 1 false negative, and 2 true negatives.