Digital Repository

Wav2Deep - Enhanced Audio Deepfake Detectionthrough Self-Supervised Learning

Show simple item record

dc.contributor.author Fowzan, Nishad
dc.date.accessioned 2025-06-20T10:13:51Z
dc.date.available 2025-06-20T10:13:51Z
dc.date.issued 2024
dc.identifier.citation Fowzan, Nishad (2024) Wav2Deep - Enhanced Audio Deepfake Detectionthrough Self-Supervised Learning. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 2019545
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2703
dc.description.abstract This research project addresses the pressing need for robust countermeasures against audio spoofing, focusing on the emerging field of audio deepfake detection (ADD). Despite recent strides in utilizing self-supervised speech models for feature extraction, current approaches face limitations in handling multi-speaker tasks and struggle with cross-domain conditions, hindering their effectiveness in real-world scenarios. This project proposes a novel solution by integrating WavLM, a cutting-edge speech selfsupervised model, as a front-end feature extractor for ADD. Leveraging advanced training techniques such as masked speech prediction and denoising, WavLM exhibits improved performance in capturing non-automatic speech recognition features, thereby enhancing the robustness of ADD systems. Moreover, to address the challenge of generalizing to unfamiliar target domains with limited source data, this project explores creating a framework for training and evaluating detection models on custom data so that researchers can create their own models for domain specific scenarios. After training for 10 epochs on a selection of datasets containing around 11000 samples each, the model with the WavLM frontend is capable of detecting audio deepfakes with average output [EER/ min-TCDF] scores of (0.152237, 0.36142), outperforming both similar speech based SSL models as well as benchmark models as tested on popular datasets, ASVSpoof 2019/ 2021 and WaveFake. The research proved successful with the proposed model being 11 times smaller in size than the next closest model. en_US
dc.language.iso en en_US
dc.subject Self Supervised Learning en_US
dc.subject Deep Learning en_US
dc.subject Audio Deepfake Detection en_US
dc.title Wav2Deep - Enhanced Audio Deepfake Detectionthrough Self-Supervised Learning en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account