Digital Repository

SysPII: Transformer Enhanced PII Detection for System and Debug Logs

Show simple item record

dc.contributor.author Panapitiya, Charuka
dc.date.accessioned 2025-07-02T06:02:08Z
dc.date.available 2025-07-02T06:02:08Z
dc.date.issued 2024
dc.identifier.citation Panapitiya, Charuka (2024) SysPII: Transformer Enhanced PII Detection for System and Debug Logs. MSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20210794
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2862
dc.description.abstract "The management of Personally Identifiable Information (PII) within system and debugging logs is crucial due to stringent data protection regulations such as GDPR and PDPA. These logs often contain sensitive data that, if mishandled, can pose significant privacy risks. Existing methodologies for PII detection in logs are often inadequate, leading to challenges in both protecting privacy and complying with regulations. This research addresses these issues by enhancing the detection and anonymization of PII using advanced techniques. This project introduces SysPII, a prototype system designed to detect and anonymize PII in system and debugging logs. The system leverages a combination of advanced AI techniques and traditional rule-based methods using the Presidio framework. Specifically, a fine-tuned DistilBERT transformer model, adapted for log data, is utilized to improve detection accuracy. The implementation integrates these models within a Streamlit-based user interface, ensuring a user-friendly experience. Extensive evaluations of SysPII, including benchmarking against two Convolutional Neural Network (CNN) models optimized for accuracy and efficiency, demonstrated the superiority of the DistilBERT transformer model. SysPII achieved an accuracy of 0.97, precision of 0.91, recall of 0.83, and an F1 score of 0.87, with a ROC AUC of 0.91. These metrics highlight its effectiveness in accurately identifying and anonymizing PII within system logs, supporting its potential for real-world applications and compliance with data protection regulations. " en_US
dc.language.iso en en_US
dc.subject PII detection en_US
dc.subject System logs en_US
dc.subject Data privacy en_US
dc.subject Transformer models en_US
dc.title SysPII: Transformer Enhanced PII Detection for System and Debug Logs en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account