2024 Conference Papers & Journal Articles

2024 Conference Papers & Journal Articles http://dlib.iit.ac.lk/xmlui/handle/123456789/2308 2026-04-28T18:04:39Z Using Machine Learning to Identify and Categorize Personally Identifiable Information and Payment Card Industry Data in Textual Content http://dlib.iit.ac.lk/xmlui/handle/123456789/2266 Using Machine Learning to Identify and Categorize Personally Identifiable Information and Payment Card Industry Data in Textual Content Arambawela, Milinda; Aponso, Achala The advent of the Internet has significantly stream-lined daily tasks through the rapid increase of online services. Everyday activities, such as purchasing goods and scheduling appointments with healthcare professionals, have become more speedy, efficient and user-friendly with the integration of the Internet. The continuous improvement of online services has led to many people moving towards digital activities. As a result, it has heightened the recording of personal and payment transaction data across various storage mediums, including databases and log files. The protection and regulation of this sensitive data are imperative, aligning with the guidelines outlined in GDPR and PCI-DSS compliances. Recognizing exposed personal data poses a considerable challenge. This research introduces a novel approach to identifying payment card industry data (PCI) and personally identifiable information (PII). The research project proposes a machine learning-based text classification model utilizing the Convolutional Neural Network (CNN) model to discern PII and PCI data within a given text. The CNN model has been constructed and compared against Naive Bayes, Gradient Boost, Random Forest, and Support Vector Machine (SVM) models. The CNN model achieved the highest accuracy at 0.96 (96%). Additionally, the F1 scores for each class were significant, with PII scoring 0.94, PCI scoring 0.95, and Normal scoring 0.99. Following the model's construction and training, it was employed with the saved tokenizer's word indexes and label encoders in the developed classification tool. This tool successfully delivered the promised results, identifying exposed PII and PCI data. 2024-01-01T00:00:00Z Cyberbullying Detection System on Social Media Using Supervised Machine Learning http://dlib.iit.ac.lk/xmlui/handle/123456789/2265 Cyberbullying Detection System on Social Media Using Supervised Machine Learning Perera, Andrea; Fernando, Pumudu The use of digital and social media is growing every day as technology advances. People in the twenty-first century are growing up in a social media and internet-enabled society. Digital media offers a lot of opportunities, but people frequently tend to misuse them. On social networking sites, people spread anger toward a person. People are affected by cyberbullying in various ways. It has an impact on more than just health; numerous other factors put life in danger. Cyberbullying is a widespread modern phenomenon that people cannot completely avoid but can prevent. The author proposes a system for automatic cyberbullying detection and prevention using supervised machine learning. The system considers key characteristics of cyberbullying, such as the intention to harm, repeated behavior, and the use of abusive language. Support vector machines and logistic regression are employed to identify cyberbullying and related themes/categories such as race, physical, sexuality, and politics. This proposed method offers a novel theory for the detection of cyberbullying: texting has evolved over time due to changes in context usage, and language. In the dataset that includes tweets, Support Vector Machine (SVM), Naïve Bayes, and Logistic Regression (LR) models were tested along with different Natural Language Processing methods. The accuracy of the system is improved by sentiment analysis, N-gram analysis, and other non-traditional feature extraction methods like Term Frequency-Inverse Document Frequency (TF-IDF) and profanity detection. 2024-01-01T00:00:00Z Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation http://dlib.iit.ac.lk/xmlui/handle/123456789/2264 Parameter Efficient Diverse Paraphrase Generation Using Sequence-Level Knowledge Distillation Jayawardena, Lasal; Yapa, Prasan Over the past year, the field of Natural Language Generation (NLG) has experienced an exponential surge, largely due to the introduction of Large Language Models (LLMs). These models have exhibited the most effective performance in a range of domains within the Natural Language Processing and Generation domains. However, their application in domain-specific tasks, such as paraphrasing, presents significant challenges. The extensive number of parameters makes them difficult to operate on commercial hardware, and they require substantial time for inference, leading to high costs in a production setting. In this study, we tackle these obstacles by employing LLMs to develop three distinct models for the paraphrasing field, applying a method referred to as sequence-level knowledge distillation. These distilled models are capable of maintaining the quality of paraphrases generated by the LLM. They demonstrate faster inference times and the ability to generate diverse paraphrases of comparable quality. A notable characteristic of these models is their ability to exhibit syntactic diversity while also preserving lexical diversity, features previously uncommon due to existing data quality issues in datasets and not typically observed in neural-based approaches. Human evaluation of our models shows that there is only a 4% drop in performance compared to the LLM teacher model used in the distillation process, despite being 1000 times smaller. This research provides a significant contribution to the NLG field, offering a more efficient and cost-effective solution for paraphrasing tasks. 2024-01-01T00:00:00Z Game-based Activity Design in Primary School Students’ Learning Style Detection http://dlib.iit.ac.lk/xmlui/handle/123456789/2263 Game-based Activity Design in Primary School Students’ Learning Style Detection Fernando, Pumudu A.; Premadasa, H.K. Salinda Generation Alpha, the present primary school cohort born after 2010, has significant exposure to mobile devices and gaming. Adopting a "One Size Fits All" approach in modern teaching methods may not be effective, as it overlooks individual learning preferences. Personalized learning can be facilitated by identifying a student’s learning style (LS). Adaptive learning based on LS has been found to have positive effects in several studies. However, traditional learning style detection techniques such as questionnaires and self-assessments can be time-consuming and demotivating for primary school students. This study aims to propose a game-based activity framework as an alternative to the Index of Learning Style (ILS) questionnaire linked with Felder Silverman Learning Style Model for LS detection. The proposed game was evaluated with a sample of sixty students, and preliminary results indicate that the game outperforms the original ILS questionnaire in terms of student engagement and motivation to complete LS activities, achieving an overall satisfaction rate of 87.5%. The second phase of the research will focus on evaluating the accuracy of LS prediction using the designed game, which is currently ongoing. 2024-01-01T00:00:00Z