Digital Repository

Raccoon AI: Identifying Unintended Queries in Large Language Models Using Prompt Engineering

Show simple item record

dc.contributor.author Alwis, Muthula
dc.date.accessioned 2025-06-12T05:40:17Z
dc.date.available 2025-06-12T05:40:17Z
dc.date.issued 2024
dc.identifier.citation Alwis, Muthula (2024) Raccoon AI: Identifying Unintended Queries in Large Language Models Using Prompt Engineering. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20200513
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2523
dc.description.abstract "In the midst of the transformative era shaped by remarkable advancements in artificial intelligence (AI) and machine learning (ML), the advent of generative AI technology has propelled society into an age where the creation of diverse content, including text, images, videos, and audio, is facilitated at an unprecedented scale. This technological surge has given rise to prominent AI models such as ChatGPT, Dall-E, and BardAI, garnering widespread popularity across diverse user segments. However, the innovative appropriation of these models, veering away from their intended purposes, has created a complex interplay within the technological landscape. Users, driven by the creative human spirit, have employed these models in novel ways, often manipulating prompts to access unintended and potentially restricted information, raising concerns about user safety and ethical considerations. This research delves into the intricate realm of Natural Language Processing (NLP) to embark on a comprehensive investigation. The primary objective is to develop a robust defense mechanism against unintentional data exposure by identifying and mitigating unintended queries. The overarching aim is to foster a more secure and responsible AI ecosystem, mitigating the ethical consequences arising from the inadvertent misuse of expansive language models. In addressing the challenge of identifying unintended prompts in Gemini model, a classification methodology was employed utilizing the Random Forests algorithm. The approach involved assembling a diverse dataset encompassing both intended and unintended prompts, followed by feature engineering to extract relevant characteristics from the prompts. After preprocessing, including data cleaning and balancing, the model was trained on this dataset, employing Random Forests for its ability to handle complex relationships within the features. The model was fine-tuned iteratively, optimizing its ability to discern between intended and unintended queries. This classification methodology provides a robust framework for identifying and mitigating potential security risks associated with unintended prompts, contributing to the creation of a more secure and responsible AI ecosystem. The preliminary results of the classification methodology utilizing Random Forests revealed promising outcomes. Confusion metrics, including accuracy, precision, recall, and F1-score, were computed to assess the model's performance in distinguishing between intended and unintended prompts. The model demonstrated a commendable ability to minimize false positives and false negatives, indicative of its efficacy in identifying potential security risks associated with unintended queries. Additionally, the Area Under the Receiver Operating Characteristic (AUC-ROC) curve was employed to evaluate the model's overall discriminative power. These quantitative metrics serve as a foundation for the ongoing refinement of the classification model, illustrating its potential to contribute significantly to the goal of enhancing security in the context of Gemini model." en_US
dc.language.iso en en_US
dc.subject Unintended Prompt Detection en_US
dc.subject Generative AI Security en_US
dc.subject Gemini Safety en_US
dc.title Raccoon AI: Identifying Unintended Queries in Large Language Models Using Prompt Engineering en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account