Image

Image Generation Model Fine Tuning With Diffusers Models

Imagine art graphics being created with a few clicks! That is what this project is about. We are exploring image generation, the art of Diffusers, and Stable Diffusion in order to transform your imagination into something real.

Project Overview

This project also enhances the image generation aspect. We use Diffusers to fine-tune specific pre-trained models so that it generate crisp and high-resolution images at a faster rate. But we don’t stop there. Everything is modified – learning rates and prompts to suit your requirements. The trained model is converted back to the stable diffusion format for easier applications.

You can easily use a Gradio interface where you can input prompts and then you’ll view the images. For instance, imagining a man running a marathon in outer space, or any other is exaggerated, this project does it all!

Buckle up for an experience and adventure as we involve technology in art.

Prerequisites

Before we dive into the code, Here’s what we’ll need:

  • Knowledge of Python programming.
  • Understanding of deep learning and neural networks.
  • Google Colab or GPU selection.
  • Awareness of the setting of the GPU in Colab or using a local CUDA device.
  • Familiar with Hugging Face and Gradio.
  • Having image processing knowledge in terms of resolution and pixel size.
  • Controlling CUDA, and GPU ( allocation of memory, monitoring the devices using nvidia-smi).
  • Understanding of Diffusers and Stable Diffusion models

Approach

The strategy of this work revolves around improving the image generation process using a systematic fine-tuning approach. In the first steps, a suitable image generation model is fine-tuned using Diffusers. This makes it easier to produce better images by modifying hyperparameters such as the learning rates and batch sizes. The model is further augmented through the use of data augmentation techniques, and the training schedule is designed to be adjustable, depending on the demands. When the model is set up, it is converted to the Stable Diffusion format to make it compatible with widely used diffusion-based frameworks. The project also provides a Gradio interface, which allows to generate images interactively by providing prompts. This makes the process intuitive and easy and also conducive to massive datasets and plenty of scenarios. Such parameters, for instance, loss values and sample images, help to control the process of work and ensure the quality of results during the entire training period.

Workflow and Methodology

The workflow of this project includes several key steps, making it easy to follow:

  • Environment Setup: To begin with, the required libraries were installed with the help of pip.
  • Datasets Collection and Preparation: The first stage consists of collecting the data set which has a variety of images.
  • Model Fine-Tuning: We started with a preloaded model and used a Diffuser model to increase image generation efficiency.
  • Training Process: Data augmentation, model optimization, and flexible parameter tweaking were used to gain additional proficiency in the model.
  • Conversion: After the training was over we switched the model back to the Stable Diffusion format.
  • Interactive UI: In the end, we have developed a Gradio interface for creating new images with bespoke prompts.

Data Collection

In this project, the image collection can be considered a very important phase of the project. You will need to take a relevant number of serial pictures. Make sure that the images show your face such that different views are covered. Since it is ideal to have a minimum of 25 images. After going through the images, keep the balance and discard. This helps provide diversity. It is significantly important in improving the robustness of the model when fine-tuning.

Data Preparation

For this project, you will be the one who will take the images to make the dataset. From different positions, it is important that the image has to be taken. The quality of the images is very important. Hence it is recommended that only the good images be set aside for use in the future. These images will then be worked upon and made ready for the training of the diffuser-based model.

Data Preparation Workflow

  • Image Capture: All the images must be taken from different viewpoints. Ensure that multiple pictures are taken from various angles.
  • Image Sorting: All the images taken through the camera in positive or negative directions. It must be analyzed and the most perfect images must be chosen.
  • Final Dataset: Make sure that at least 25 images, which are to the best capture and in terms of model training. Possible Images are from different angle positions relative to the subject-oriented images. That is retained in the final dataset for modeling purposes.

Explanation All Code

STEP 1:

Gathering GPU information

In this code, we utilize commands to query GPU information using NVIDIA System Management Interface (nvidia-smi). This ensures that the environment is set up correctly for training.

# Command to query GPU information (name, total memory, and free memory) using NVIDIA System Management Interface (nvidia-smi)
# The output is formatted in CSV format with no header
!nvidia-smi --query-gpu=name,memory.total,memory.free --format=csv,noheader

Two Python scripts are downloaded from a GitHub repository. One for training models. And another for converting Diffusers models to the original Stable Diffusion format. Next, it clones the Diffusers library from GitHub and then updates the Triton library for the purpose of further deep-learning improvements.

Moreover, some more packages like `accelerate`, `transformers`, `ftfy`, `bitsandbytes`, `gradio`, and `natsort` are installed, which help in the optimization of training using GPUs, text handling and in developing interfaces respectively.

# Downloading the training script for the DreamBooth project from the Diffusers GitHub repository
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/examples/dreambooth/train_dreambooth.py
# Downloading the script to convert Diffusers models to the original Stable Diffusion format
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/scripts/convert_diffusers_to_original_stable_diffusion.py
# Installing the Diffusers library from the GitHub repository
%pip install -qq git+https://github.com/ShivamShrirao/diffusers
# Upgrading the Triton library to the latest pre-release version
%pip install -q -U --pre triton
# Installing additional Python packages required for the project
%pip install -q accelerate transformers ftfy bitsandbytes gradio natsort

STEP 2:

Input The Huggingface Token

This script helps to configure the Hugging Face authentication as it defines a hidden directory if one does not exist; \~/.huggingface. It then translates the token Hugging Face API token, set for use in other parts of authentication. Last of all, the token is stored in the file token under the directory \~/.huggingface, which is necessary for using Hugging Face services, such as downloading or using models.

# Create a directory ~/.huggingface if it doesn't exist
!mkdir -p ~/.huggingface
# Set Hugging Face token for authentication
HUGGINGFACE_TOKEN = "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"
# Write the Hugging Face token to a file named token within the ~/.huggingface directory
!echo -n "{HUGGINGFACE_TOKEN}" > ~/.huggingface/token

This block sets the model name and output directory for saving trained weights, optionally mounting Google Drive for storage and ensuring the output directory exists.
# Boolean flag to indicate whether to save to Google Drive
save_to_gdrive = True
# If save_to_gdrive is True, mount Google Drive to the Colab environment
if save_to_gdrive:
    from google.colab import drive
    drive.mount('/content/drive')
# Name/Path of the initial model
MODEL_NAME = "stabilityai/stable-diffusion-2"
# Define the directory name to save the model weights
OUTPUT_DIR = "stable_diffusion_weights/aionlinecourse"
# If save_to_gdrive is True, set the output directory path to be within the Google Drive
if save_to_gdrive:
    OUTPUT_DIR = "/content/drive/" + OUTPUT_DIR
else:
    # Otherwise, set the output directory path to be within the Colab environment
    OUTPUT_DIR = "/content/" + OUTPUT_DIR
# Print the path where the weights will be saved
print(f"[*] Weights will be saved at {OUTPUT_DIR}")
# Create the output directory if it doesn't already exist
!mkdir -p $OUTPUT_DIR

A list of dictionaries representing concepts is stored in the concept_list variable. It is specifying prompts and data directories. It creates necessary directories for instance data and writes the concepts list to a JSON file.

# Define a list of dictionaries representing concepts
concepts_list = [
    {
        "instance_prompt": "art by aionlinecourse",
        "class_prompt": "art by a person",
        "instance_data_dir": "/content/drive/stable_diffusion_weights/aionlinecourse",
        "class_data_dir": "/content/data/artbyperson"
    }
]
# Import the json and os modules
import json
import os
# Create directories specified in the instance_data_dir attribute of each concept dictionary
# If the directories already exist, no error will be raised (exist_ok=True)
for c in concepts_list:
    os.makedirs(c["instance_data_dir"], exist_ok=True)
# Write the concepts_list to a JSON file named concepts_list.json with indentation for readability
with open("concepts_list.json", "w") as f:
    json.dump(concepts_list, f, indent=4)

STEP 3:

Upload the training data

This code uploads instance images from the user's local system to the appropriate instance data directory for every idea listed in the 'concepts_list'. For every loop, it prints a message indicating the uploading process. 'shutil.move()' is used to relocate each file to its assigned directory after uploading.

# Import necessary modules
import os
from google.colab import files
import shutil
# Iterate over each concept in concepts_list
for c in concepts_list:
    # Print a message indicating the instance images being uploaded for the current concept
    print(f"Uploading instance images for `{c['instance_prompt']}`")
    # Upload files from the user's local system and store them in the uploaded dictionary
    uploaded = files.upload()
    # Iterate over each uploaded file
    for filename in uploaded.keys():
        # Define the destination path where the file will be moved
        dst_path = os.path.join(c['instance_data_dir'], filename)
        # Move the uploaded file to the destination path
        shutil.move(filename, dst_path)

STEP 4:

Fine-tune model and train

The training script "train_dreambooth.py" is launched using this command. The training script is executed with specified parameters, including model paths, batch sizes, learning rates, and other hyperparameters, to fine-tune the model using the provided instance images.

# Launching the training script using the 'accelerate' command-line tool
!accelerate launch train_dreambooth.py \
# Specifies the path or name of the pre-trained model to use
  --pretrained_model_name_or_path=$MODEL_NAME \
# Specifies the path or name of the pre-trained VAE model to use
  --pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
# Specifies the directory where the trained model and related files will be saved
  --output_dir=$OUTPUT_DIR \
# Specifies the revision for the training process (floating-point 16-bit precision)
  --revision="fp16" \
# Indicates to include prior preservation in the training process
  --with_prior_preservation \
# Specifies the weight for the prior loss in the training process
  --prior_loss_weight=1.0 \
# Specifies the seed value for random number generation to ensure reproducibility
  --seed=1337 \
# Specifies the resolution of the images used in training
  --resolution=512 \
# Specifies the batch size for training
  --train_batch_size=1 \
# Indicates to train the text encoder
  --train_text_encoder \
# Specifies mixed precision training using floating-point 16-bit precision
  --mixed_precision="fp16" \
# Indicates to use 8-bit Adam optimizer for training
  --use_8bit_adam \
# Specifies the number of gradient accumulation steps
  --gradient_accumulation_steps=1 \
# Enables gradient checkpointing to save memory during training
  --gradient_checkpointing \
# Specifies the learning rate for the optimizer
  --learning_rate=2e-6 \
# Specifies the type of learning rate scheduler used (constant)
  --lr_scheduler="constant" \
# Specifies the number of warmup steps for the learning rate scheduler
  --lr_warmup_steps=0 \
# Specifies the number of class images used in training
  --num_class_images=1 \
# Specifies the batch size for sampling during training
  --sample_batch_size=4 \
# Specifies the maximum number of training steps
  --max_train_steps=500 \
# Specifies the interval for saving checkpoints during training
  --save_interval=100 \
# Specifies the prompt for saving sample images during training
  --save_sample_prompt="a very good man, art by aionlinecourse" \
# Specifies the JSON file containing the list of concepts used in training
  --concepts_list="concepts_list.json"

STEP 5:

Training Files Convert To the "model.ckpt" File and Save The Model

This piece of code determines whether the variable 'WEIGHTS_DIR' has an empty string. If so, it imports the 'glob', 'os', and 'natsort' modules. That is required and gets a list of all the files and directories in the 'OUTPUT_DIR'. 'Natsorted'. It is used to sort the list, and 'WEIGHTS_DIR' is allocated to the final directory. Lastly, it outputs the directory path containing the training weights.

# Define the directory path where the trained weights are located
WEIGHTS_DIR = "/content/drive/MyDrive/stable_diffusion_weights/aionlinecourse/500"
# Check if the WEIGHTS_DIR is an empty string
if WEIGHTS_DIR == "":
    # If WEIGHTS_DIR is empty, import necessary modules
    from natsort import natsorted
    from glob import glob
    import os
    # Use glob to find all files and directories in OUTPUT_DIR and sort them using natsort
    sorted_dirs = natsorted(glob(OUTPUT_DIR + os.sep + "*"))
    # Select the last directory in the sorted list as WEIGHTS_DIR
    WEIGHTS_DIR = sorted_dirs[-1]
# Print the directory path where the trained weights are located
print(f"[*] WEIGHTS_DIR={WEIGHTS_DIR}")

The code converts Diffusers models to Stable Diffusion format using a Python script. It creates a checkpoint path. Uses half-precision floating-point computation. The script confirms completion and saves the converted checkpoint location.
# Concatenate WEIGHTS_DIR and "/model.ckpt" to create the checkpoint path
ckpt_path = WEIGHTS_DIR + "/model.ckpt"
# Initialize half_arg variable as an empty string
half_arg = ""
# Set fp16 flag to True
fp16 = True
# Check if fp16 is True
if fp16:
    # If fp16 is True, set half_arg to "--half"
    half_arg = "--half"
# Execute the script to convert Diffusers models to the original Stable Diffusion format
!python convert_diffusers_to_original_stable_diffusion.py --model_path $WEIGHTS_DIR --checkpoint_path $ckpt_path $half_arg
# Print a message indicating the completion of the conversion process
print(f"[*] Converted ckpt saved at {ckpt_path}")

This code imports various modules from the IPython.display and sets the model path. It defines a DDIMScheduler, and initializes the g_cuda variable. After loading a pre-trained model into the Stable Diffusion Pipeline.
# Import necessary modules
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, DDIMScheduler
from IPython.display import display
# Set the model path to WEIGHTS_DIR
model_path = WEIGHTS_DIR
# Define the DDIMScheduler with specified parameters
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False)
# Load the Stable Diffusion Pipeline from the pre-trained model
# Set scheduler, safety_checker to None, and torch_dtype to torch.float16
# Move the pipeline to GPU
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16).to("cuda")

The code initializes a torch Generator object, g_cuda, for the CUDA device. Setting a seed value of 47853 for random number generation, ensuring reproducibility.
# Initialize g_cuda variable
g_cuda = None
# Initialize a torch Generator object for CUDA device
g_cuda = torch.Generator(device='cuda')
# Set a seed value for the generator
seed = 47853
g_cuda.manual_seed(seed)

STEP 6:

Run For Generating Images

This block of code defines prompts and parameters for the Stable Diffusion Pipeline image generation. When CUDA and inference mode are enabled. It executes inference in autocast mode. Using 'IPython.display', the 'display' function is used to display the created images. Every line has a note explaining what parameter is being used to create the image.

# Define the prompt for generating images
prompt = "A man in a spacesuit is running a marathon in the road." #@param {type:"string"}
# Define the number of samples to generate
num_samples = 2 #@param {type:"number"}
# Define the guidance scale for generating images
guidance_scale = 9 #@param {type:"number"}
# Define the number of inference steps for generating images
num_inference_steps = 100 #@param {type:"number"}
# Define the height of the generated images
height = 512 #@param {type:"number"}
# Define the width of the generated images
width = 512 #@param {type:"number"}
# Perform inference in autocast mode with CUDA and inference mode enabled
with autocast("cuda"), torch.inference_mode():
    # Generate images using the Stable Diffusion Pipeline
    images = pipe(
        prompt,
        height=height,
        width=width,
        num_images_per_prompt=num_samples,
        num_inference_steps=num_inference_steps,
        guidance_scale=guidance_scale,
        generator=g_cuda
    ).images
# Display the generated images
for img in images:
    display(img)

STEP 7:

Run Gradio UI For Generating Images

This section sets up a Gradio interface for interactive image generation. It defines input components for prompts, image parameters, and a button to trigger the generation process. This allows users to visualize the generated images easily.

# Import Gradio library as gr
import gradio as gr
# Define the inference function for generating images
def inference(prompt, negative_prompt, num_samples, height=512, width=512, num_inference_steps=50, guidance_scale=7.5):
    # Perform inference in autocast mode with CUDA and inference mode enabled
    with torch.autocast("cuda"), torch.inference_mode():
        # Generate images using the Stable Diffusion Pipeline
        return pipe(
                prompt, height=int(height), width=int(width),
                negative_prompt=negative_prompt,
                num_images_per_prompt=int(num_samples),
                num_inference_steps=int(num_inference_steps), guidance_scale=guidance_scale,
                generator=g_cuda
            ).images
# Create a Gradio Blocks interface named demo
with gr.Blocks() as demo:
    with gr.Row():
        with gr.Column():
            # Add a textbox for entering the prompt
            prompt = gr.Textbox(label="Prompt", value="photo of zwx dog in a bucket")
            # Add a textbox for entering the negative prompt
            negative_prompt = gr.Textbox(label="Negative Prompt", value="")
            # Add a button for triggering the image generation process
            run = gr.Button(value="Generate")
            with gr.Row():
                # Add a number input for specifying the number of samples to generate
                num_samples = gr.Number(label="Number of Samples", value=4)
                # Add a number input for specifying the guidance scale
                guidance_scale = gr.Number(label="Guidance Scale", value=7.5)
            with gr.Row():
                # Add a number input for specifying the height of the generated images
                height = gr.Number(label="Height", value=512)
                # Add a number input for specifying the width of the generated images
                width = gr.Number(label="Width", value=512)
            # Add a slider for specifying the number of inference steps
            num_inference_steps = gr.Slider(label="Steps", value=50)
        with gr.Column():
            # Add a gallery component for displaying the generated images
            gallery = gr.Gallery()
    # Define the callback function for the button click event
    run.click(inference, inputs=[prompt, negative_prompt, num_samples, height, width, num_inference_steps, guidance_scale], outputs=gallery)
# Launch the Gradio interface for generating images
demo.launch(debug=True)

Project Conclusion

In summary, this project showcases how we can fine-tune powerful image generation models using Diffusers and Stable Diffusion. The Image Generation Model Fine Tuning with Diffusers Models highlights the powerful capabilities of combining Diffusers. We significantly improved its image generation performance. It shows how deep learning can be tailored for diverse and realistic image outputs. This project can produce high-resolution images. Real-time use of Gradio makes it a highly interactive and user-friendly tool. It offers real-time image generation based on user input.

The project serves as a valuable demonstration of cutting-edge AI techniques in image generation. This will help applications in art, design, media, and more. The methods developed here can be expanded for various creative and commercial purposes. It proves the potential of diffusion models in modern AI-driven projects.

Challenges and Troubleshooting

  • Challenge: Dependency Conflicts
    Solution: Always update packages carefully. Use the command pip install -U for updates and check for any warnings in the logs. Stick to the recommended versions in the script to avoid conflicts.

  • Challenge: GPU Availability
    Solution: Use Google Colab Pro or Kaggle Notebooks for extra GPU support. You can also reduce batch sizes or optimize settings like image resolution to lessen GPU strain.

  • Challenge: Hugging Face Token Issues
    Solution: Make sure to generate a valid Hugging Face token from your account. Store it securely, and ensure it’s passed correctly in the environment setup commands.

  • Challenge: Model Training Speed
    Solution: Use techniques like gradient accumulation and 8-bit Adam optimizer to speed up training without sacrificing performance. Also, check GPU utilization regularly to ensure it's working efficiently.

  • Challenge: Converting to Stable Diffusion Format
    Solution: Follow the conversion steps exactly as outlined. Double-check directory paths and ensure you're using fp16 precision if needed for your hardware.

Also for a better understanding, you can visit our resources and tutorial to explore more.

FAQs

Question 1: What is the aim this project seeks to achieve?
Answer: The project focuses on the improvement of image generation. By utilizing the already available image generation models. High resolution images could be produced by diffusers. It also demonstrates that diffusion models are capable of image generation.

Question 2: What is the role of diffusers in image generation?
Answer: The deflectors are applied to enhance the image generation ability as well as the performance of the model. They assist in finding the optimal compromise between the speed of image processing and the quality of the result. By understanding how images are created through diffusion.

Question 3: What is done to modify the diffuser model to yield generated images?
Answer: The model is adjusted by changing the hyperparameters. Such as the learning rates, the batch sizes, and the architectural settings. The fine-tuning procedure consists of the use of a new dataset into which data augmentation is performed. The model is trained to generate better images.

Question 4; What difficulties were faced while training your model?
Answer: Some difficulties were related to working with large datasets. It related memory limitation issues, performance and quality of generation met issues. Conversion distortions between diffusers and stable diffusion formats. Hyperparameter tuning and optimizing the infrastructure helped in combating these problems.

Question 5: In what way can the user make use of the fine-tuned model?
Answer: The fine-tuned model is made accessible to the users through a gradio interface. Here users provide prompts and the model generates images in response.

Code Editor