Wav2Deep - Enhanced Audio Deepfake Detectionthrough Self-Supervised Learning

Fowzan, Nishad

dc.contributor.author	Fowzan, Nishad
dc.date.accessioned	2025-06-20T10:13:51Z
dc.date.available	2025-06-20T10:13:51Z
dc.date.issued	2024
dc.identifier.citation	Fowzan, Nishad (2024) Wav2Deep - Enhanced Audio Deepfake Detectionthrough Self-Supervised Learning. BSc. Dissertation, Informatics Institute of Technology	en_US
dc.identifier.issn	2019545
dc.identifier.uri	http://dlib.iit.ac.lk/xmlui/handle/123456789/2703
dc.description.abstract	This research project addresses the pressing need for robust countermeasures against audio spoofing, focusing on the emerging field of audio deepfake detection (ADD). Despite recent strides in utilizing self-supervised speech models for feature extraction, current approaches face limitations in handling multi-speaker tasks and struggle with cross-domain conditions, hindering their effectiveness in real-world scenarios. This project proposes a novel solution by integrating WavLM, a cutting-edge speech selfsupervised model, as a front-end feature extractor for ADD. Leveraging advanced training techniques such as masked speech prediction and denoising, WavLM exhibits improved performance in capturing non-automatic speech recognition features, thereby enhancing the robustness of ADD systems. Moreover, to address the challenge of generalizing to unfamiliar target domains with limited source data, this project explores creating a framework for training and evaluating detection models on custom data so that researchers can create their own models for domain specific scenarios. After training for 10 epochs on a selection of datasets containing around 11000 samples each, the model with the WavLM frontend is capable of detecting audio deepfakes with average output [EER/ min-TCDF] scores of (0.152237, 0.36142), outperforming both similar speech based SSL models as well as benchmark models as tested on popular datasets, ASVSpoof 2019/ 2021 and WaveFake. The research proved successful with the proposed model being 11 times smaller in size than the next closest model.	en_US
dc.language.iso	en	en_US
dc.subject	Self Supervised Learning	en_US
dc.subject	Deep Learning	en_US
dc.subject	Audio Deepfake Detection	en_US
dc.title	Wav2Deep - Enhanced Audio Deepfake Detectionthrough Self-Supervised Learning	en_US
dc.type	Thesis	en_US