Conferance Papers

Conferance Papers http://dlib.iit.ac.lk/xmlui/handle/123456789/2309 2026-04-28T21:21:55Z 2026-04-28T21:21:55Z Using Machine Learning to Identify and Categorize Personally Identifiable Information and Payment Card Industry Data in Textual Content Arambawela, Milinda Aponso, Achala http://dlib.iit.ac.lk/xmlui/handle/123456789/2266 2025-05-02T23:07:20Z 2024-01-01T00:00:00Z

Using Machine Learning to Identify and Categorize Personally Identifiable Information and Payment Card Industry Data in Textual Content Arambawela, Milinda; Aponso, Achala The advent of the Internet has significantly stream-lined daily tasks through the rapid increase of online services. Everyday activities, such as purchasing goods and scheduling appointments with healthcare professionals, have become more speedy, efficient and user-friendly with the integration of the Internet. The continuous improvement of online services has led to many people moving towards digital activities. As a result, it has heightened the recording of personal and payment transaction data across various storage mediums, including databases and log files. The protection and regulation of this sensitive data are imperative, aligning with the guidelines outlined in GDPR and PCI-DSS compliances. Recognizing exposed personal data poses a considerable challenge. This research introduces a novel approach to identifying payment card industry data (PCI) and personally identifiable information (PII). The research project proposes a machine learning-based text classification model utilizing the Convolutional Neural Network (CNN) model to discern PII and PCI data within a given text. The CNN model has been constructed and compared against Naive Bayes, Gradient Boost, Random Forest, and Support Vector Machine (SVM) models. The CNN model achieved the highest accuracy at 0.96 (96%). Additionally, the F1 scores for each class were significant, with PII scoring 0.94, PCI scoring 0.95, and Normal scoring 0.99. Following the model's construction and training, it was employed with the saved tokenizer's word indexes and label encoders in the developed classification tool. This tool successfully delivered the promised results, identifying exposed PII and PCI data.

2024-01-01T00:00:00Z Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation Jayawardena, Lasal Yapa, Prasan http://dlib.iit.ac.lk/xmlui/handle/123456789/2264 2025-05-02T23:09:11Z 2024-01-01T00:00:00Z

Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation Jayawardena, Lasal; Yapa, Prasan Over the past year, the field of Natural Language Generation (NLG) has experienced an exponential surge, largely due to the introduction of Large Language Models (LLMs). These models have exhibited the most effective performance in a range of domains within the Natural Language Processing and Generation domains. However, their application in domain-specific tasks, such as paraphrasing, presents significant challenges. The extensive number of parameters makes them difficult to operate on commercial hardware, and they require substantial time for inference, leading to high costs in a production setting. In this study, we tackle these obstacles by employing LLMs to develop three distinct models for the paraphrasing field, applying a method referred to as sequence-level knowledge distillation. These distilled models are capable of maintaining the quality of paraphrases generated by the LLM. They demonstrate faster inference times and the ability to generate diverse paraphrases of comparable quality. A notable characteristic of these models is their ability to exhibit syntactic diversity while also preserving lexical diversity, features previously uncommon due to existing data quality issues in datasets and not typically observed in neural-based approaches. Human evaluation of our models shows that there is only a 4% drop in performance compared to the LLM teacher model used in the distillation process, despite being 1000 times smaller. This research provides a significant contribution to the NLG field, offering a more efficient and cost-effective solution for paraphrasing tasks.

2024-01-01T00:00:00Z Deep Learning-powered Mobile App for Early Brahmi Script Decipherment in Sri Lanka Gunasekara, Sakith Lafir, Muhammed Haleef Dulaj, Chavindu Haputhanthri, Lakidu Alwis, Dileeka http://dlib.iit.ac.lk/xmlui/handle/123456789/2259 2025-05-03T02:16:00Z 2024-01-01T00:00:00Z

Deep Learning-powered Mobile App for Early Brahmi Script Decipherment in Sri Lanka Gunasekara, Sakith; Lafir, Muhammed Haleef; Dulaj, Chavindu; Haputhanthri, Lakidu; Alwis, Dileeka This ongoing research delves into ancient Brahmi writings carved on stone surfaces, particularly focusing on early Brahmi characters which serves as critical artifacts to illuminate the island’s historical ties with India, religious and cultural practices of the era. The study aims to develop a mobile application utilizing deep learning techniques to recognize and translate ancient early Brahmi characters, ensuring enhanced efficiency and accuracy in the decipherment process. A comprehensive literature review highlighted the absence of a mobile application, challenges of contextual translation, and difficulties in accurate predicting translation for damaged inscriptions and translation to foreign languages were identified as notable research gaps. Drawing upon interdisciplinary expertise and stakeholder analysis, the research tackles these complex challenges of script recognition, linguistic translation, accurate prediction by employing a data-driven approach, cutting-edge algorithms, and user-centric design principles. After the creation of a refined, precise digitized dataset of early Brahmi characters with found variations, the research aims to utilize semantic segmentation techniques to recognize characters using TensorFlow and Keras. OpenCV for image preprocessing, Flutter framework for development of mobile application. Expected outcomes include improved recognition accuracy, linguistically faithful translations, and user-friendly interfaces, contributing to advancements in digital humanities, cultural preservation, and computational linguistics.

2024-01-01T00:00:00Z Swa Bhasha 2.0: Addressing Ambiguities in Romanized Sinhala to Native Sinhala Transliteration Using Neural Machine Translation Dharmasiri, Sachithya Sumanathilaka, T.G.D.K. http://dlib.iit.ac.lk/xmlui/handle/123456789/2258 2025-05-03T02:17:36Z 2024-01-01T00:00:00Z

Swa Bhasha 2.0: Addressing Ambiguities in Romanized Sinhala to Native Sinhala Transliteration Using Neural Machine Translation Dharmasiri, Sachithya; Sumanathilaka, T.G.D.K. With the growing popularity of social media and instantaneous messaging, it is more important than ever to interact online in your native language. In Sinhala, both Romanized and native Sinhala are widely used. Due to the informal textual abbreviation known as “Singlish” however, attempts to translate Romanized Sinhala into native Sinhala via machine transliteration may result in errors. Rule-based transliteration systems may not be compatible with the ad hoc transliterations used in Singlish. To translate Romanized Sinhala back precisely and consistently into Native Sinhala, a novel NMT approach has been proposed. To address the complexities of casual Romanized Sinhala, a hybrid strategy combining rule-based and neural machine translation has been proposed. This strategy aims to eliminate word selection ambiguity by selecting the best word suggestions from a pool of predicted words using a suggestion algorithm. Combining the advantages of Suggestion algorithms and neural machine translation, the proposed transliterator has the potential to considerably enhance reverse transliteration and improve communication in native Sinhala by combining the strengths of both approaches. After completing the GRU model, the performance of the machine translation models on the BLEU test improved to 0.8, indicating high word-level translation accuracy. Significant potential exists for the proposed transliterator to enhance reverse transliteration and improve communication in Sinhala. While preliminary test results are promising, additional testing and refinement are required to improve the overall efficacy of machine translation models.

2024-01-01T00:00:00Z