FineDiffusion: Fine-tuning text-to-image Diffusion Models with  Subject Personalization and Conditional Control

Muthunayaka, Geeth

FineDiffusion: Fine-tuning text-to-image Diffusion Models with Subject Personalization and Conditional Control

Muthunayaka, Geeth

URI: http://dlib.iit.ac.lk/xmlui/handle/123456789/3070

Date: 2025

Abstract:

In recent years, text-to-image generation has gained a lot of attention. Diffusion models have been proven to be the state-of-the-art in this domain. However, due to their high computational demands, most of the current research focuses on improving their efficiency and image quality. It has also been identified that existing text-to-image solutions have very limited usability and applicability due to their lack of control. This project aims to address the lack of control and customization in text-to-image diffusion models by developing a solution that enhances their controllability and customizability. This Project proposes a unified architecture and pipeline that combines multiple fine-tuning techniques to enables both subject personalization and conditional control. Subject personalization allows for customized image generation of specific subjects, and conditional control enables the diffusion model to utilize conditioning images during the image generation process. The diffusion model must be fine-tuned with multiple datasets to enable these techniques. The prototype implementation successfully demonstrates the core functionalities of the proposed solution. Based on the qualitative self-evaluation, the implemented architecture and pipeline demonstrates the primary fine-tuning techniques with satisfactory results. The fine tuned latent diffusion model utilised in the prototype achieved a quantitative CLIP Score of 71.15

Show full item record