“PeaceKeeper” : Transformer-Based Multimodal Public Violence Detection System

Wickrama Arachchi, Hashini

dc.contributor.author	Wickrama Arachchi, Hashini
dc.date.accessioned	2025-06-09T03:26:25Z
dc.date.available	2025-06-09T03:26:25Z
dc.date.issued	2024
dc.identifier.citation	Wickrama Arachchi, Hashini (2024) “PeaceKeeper” : Transformer-Based Multimodal Public Violence Detection System. BSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	20200477
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2468
dc.description.abstract	Violence is a key aspect experienced by public in common premises where a group of people get together. During a violence occurrence, a main problem is that the public suffer from not having a prompt alert methodology and not being able to receive necessary evidences about the culprits and affected persons or property immediately for legal investigations. In fact, occurrence of violence is a swift scenario that lasts for a short period of time, which emphasizes the importance of real time accurate VD systems to be introduced which is capable of detecting both ongoing and future violence without any sort of human intervention. A system is proposed through this project to build a combination of trio transformer architectures which belongs to “Transformers”; a recent attention mechanism-based technology in computer vision domain. Video classification for fight detection is implemented using Video Vision Transformer (ViViT), image classification for weapon classification built using Vision Transformer (ViT) and the violent audio classification is to be achieved using Audio Spectrogram Transformer (AST). Currently, the implemented ViViT model archives an accuracy of 60.0% during 50 epochs with V100 GPU. The ViT model achieves 99.53% overall accuracy on the multiple classes. The overall accuracy of 59.06% is achieved by AST model for audio classification.	en_US
dc.language.iso	en	en_US
dc.subject	Transformers	en_US
dc.subject	Violence Detection	en_US
dc.subject	Video Vision Transformer	en_US
dc.title	“PeaceKeeper” : Transformer-Based Multimodal Public Violence Detection System	en_US
dc.type	Thesis	en_US