Digital Repository

ViT-DiverseGAN: A Multifaceted Vision Transformer Enhanced Unpaired Conditional Optimized Stable Generative Adversarial Network for Universal Image-to-Image Translation with Augmented Diversity in Face Recognition Training

Show simple item record

dc.contributor.author Edirisinghe, Dilina
dc.date.accessioned 2025-06-06T04:23:20Z
dc.date.available 2025-06-06T04:23:20Z
dc.date.issued 2024
dc.identifier.citation Edirisinghe, Dilina (2024) ViT-DiverseGAN: A Multifaceted Vision Transformer Enhanced Unpaired Conditional Optimized Stable Generative Adversarial Network for Universal Image-to-Image Translation with Augmented Diversity in Face Recognition Training. BSc. Dissertation, Informatics Institute of Technology en_US
dc.identifier.issn 20200520
dc.identifier.uri http://dlib.iit.ac.lk/xmlui/handle/123456789/2449
dc.description.abstract "This research tackles the challenge of diversifying face recognition (FR) training via optimized, cost-effective data augmentation (DA). Existing DA techniques lack diversity, prompting the creation of ViT-DiverseGAN, a novel generative adversarial network (GAN) integrating vision transformer (ViT) technology. ViT-DiverseGAN efficiently extracts global features from images, overcoming training instability and mode collapse. It's a robust solution for diverse real-world conditions and demographics. To tackle diversity challenges, resource intensiveness, training instability, and mode collapse in GANs, a novel approach was devised. This involves a conditional GAN with two ViT generators and discriminators for multi-domain translation. The architecture integrates U-Net and MODNet networks for generators and ViT-standardized PatchGAN discriminators. Training incorporates a ViT-enhanced masked autoencoder model, enhancing stability and image diversity. This methodology enables a computationally feasible framework for face recognition model training, addressing complexities in real-world scenarios. The conducted tests on the CelebA-HQ dataset demonstrate proposed model's efficacy. Achieving FID/KID (x100) scores of 17.07/16.27 and 0.12/0.16 in gender transformation tasks. For add-eyeglass and eyeglass removal on CelebA dataset, model scores 11.04/10.25 FID and 0.26/0.21 KID (x100). ViT-DiverseGAN maintains 500 training epochs for any dataset, surpassing state-of-the-art models in multi-domain translation. Augmentation for eyeglass addition and removal boosts face recognition accuracy by 97.74% in the MeGlass dataset, as assessed by FaceNet benchmarking, showcasing its real-world efficacy." en_US
dc.language.iso en en_US
dc.subject Generative Adversarial Networks en_US
dc.subject Vision Transformers en_US
dc.subject Data Augmentation en_US
dc.title ViT-DiverseGAN: A Multifaceted Vision Transformer Enhanced Unpaired Conditional Optimized Stable Generative Adversarial Network for Universal Image-to-Image Translation with Augmented Diversity in Face Recognition Training en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search


Advanced Search

Browse

My Account