dc.description.abstract |
"This research tackles the challenge of diversifying face recognition (FR) training via
optimized, cost-effective data augmentation (DA). Existing DA techniques lack diversity,
prompting the creation of ViT-DiverseGAN, a novel generative adversarial network (GAN)
integrating vision transformer (ViT) technology. ViT-DiverseGAN efficiently extracts global
features from images, overcoming training instability and mode collapse. It's a robust
solution for diverse real-world conditions and demographics.
To tackle diversity challenges, resource intensiveness, training instability, and mode collapse
in GANs, a novel approach was devised. This involves a conditional GAN with two ViT
generators and discriminators for multi-domain translation. The architecture integrates U-Net
and MODNet networks for generators and ViT-standardized PatchGAN discriminators.
Training incorporates a ViT-enhanced masked autoencoder model, enhancing stability and
image diversity. This methodology enables a computationally feasible framework for face
recognition model training, addressing complexities in real-world scenarios.
The conducted tests on the CelebA-HQ dataset demonstrate proposed model's efficacy.
Achieving FID/KID (x100) scores of 17.07/16.27 and 0.12/0.16 in gender transformation
tasks. For add-eyeglass and eyeglass removal on CelebA dataset, model scores 11.04/10.25
FID and 0.26/0.21 KID (x100). ViT-DiverseGAN maintains 500 training epochs for any
dataset, surpassing state-of-the-art models in multi-domain translation. Augmentation for
eyeglass addition and removal boosts face recognition accuracy by 97.74% in the MeGlass
dataset, as assessed by FaceNet benchmarking, showcasing its real-world efficacy." |
en_US |