Classifier-free Guidance with Cosine Schedules Pt. 5

Exploring a range of guidance values for T-Normalization.

Introduction

This notebook is Part 5 in a series on dynamic Classifier-free Guidance. It explores smaller $G$ values for normalizations.

Recap of Parts 1-4

The first three parts explored how to turn Classifier-free Guidance into a dynamic process. We found an initial set of schedules and normalizers that seem to improve the quality of Diffusion images. We then dug in and refined a few of the most promising schedules.

Part 5: Exploring values for T-Normalization

Part 5 answers the question: what should the value of $G_\text{small}$ be for T-Normalization and Full Normalization?

Recall that these two normalizations scale the update vector $\left(t - u \right)$. That places the update vector on a different scale than the unconditioned vector $u$. If we then scaled the update vector by a large scalar, say $G = 7.5$, the output collapses to noise. In fact it seems to collapse to the true mode of the latent image distribution: uniform, brown values.

These two normalizations are very promising: they improve the syntax and details of the image. However, we only explored a single value of $G_\text{small} = 0.15$. This is very different from the default $G = 7.5$ that has been truly explored in regular Classifier-free Guidance.

This notebook tries to find a good starting point for $G_\text{small}$, so we can try the normalizations with our best schedules so far.

Python imports

We start with a few basic python imports.

import os
import random
from functools import partial
import torch
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors

Seed for reproducibility

seed_everything makes sure that the results are reproducible across notebooks.

# set the seed and pseudo random number generator
SEED = 1024
def seed_everything(seed):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    generator = torch.manual_seed(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False
    return generator

# for sampling the initial, noisy latents
generator = seed_everything(SEED)

Constant schedules with a range of $G_\text{small}$ values

We can try different $G_\text{small}$ values using the cf_guidance library.

# helpers to create cosine schedules
from cf_guidance.schedules  import get_cos_sched

# normalizations for classifier-free guidance
from cf_guidance.transforms import TNormGuidance, FullNormGuidance

For the other schedule parameters, we will use the same values from the running series on dynamic Classifier-free Guidance.

# Default schedule parameters from the blog post
######################################
max_val           = 0.15  # guidance scaling value
min_val           = 0.0   # minimum guidance scaling
num_steps         = 50    # number of diffusion steps
num_warmup_steps  = 0     # number of warmup steps
warmup_init_val   = 0     # the intial warmup value
num_cycles        = 0.5   # number of cosine cycles
k_decay           = 1     # k-decay for cosine curve scaling 
######################################

DEFAULT_COS_PARAMS = {
    'max_val':           max_val,
    'num_steps':         num_steps,
    'min_val':           min_val,
    'num_cycles':        num_cycles,
    'k_decay':           k_decay,
    'num_warmup_steps':  num_warmup_steps,
    'warmup_init_val':   warmup_init_val,
}

def cos_harness(new_params={}, default_params={}):
    '''Creates cosine schedules with updated parameters in `new_params`
    '''
    # start from the given baseline `cos_params`
    cos_params = dict(default_params)
    # update the schedule with any new parameters
    cos_params.update(new_params)
    
    # return the new cosine schedule
    sched = get_cos_sched(**cos_params)
    return sched


def create_expts(params: dict, schedule_func) -> list:
    '''Creates a list of experiments.
    
    Each element is a dictionary with the name, value, and schedule for a given parameter.
    A `title` field is also added for easy plotting.
    '''
    names = sorted(params)
    expts = []
    # step through parameter names and their values
    for i,name in enumerate(names):
        for j,val in enumerate(params[name]):
            # create the experiment
            expt = {'param_name': name,
                    'val': val,
                    'schedule': schedule_func(val)}
                    # 'schedule': schedule_func({name: val})}
            # name for plotting
            expt['title'] = f'Param: "{name}", val={val}'
            # add it to the experiment list
            expts.append(expt)
    return expts


# create the constant G_small cosine experiments
const_params = {'max_val': [0.01, 0.03, 0.05, 0.08, 0.1, 0.15, 0.2, 0.22, 0.25, 0.3]}
const_func = lambda val: [val for _ in range(num_steps)]
const_expts = create_expts(const_params, const_func)

Plotting the $G_\text{small}$ values

#| echo: false
#| output: true
colors=list(mcolors.TABLEAU_COLORS)

# setup the plot
fig,ax = plt.subplots(figsize=(12,8))
plt.title('Constant Schedules for G_small', fontsize='xx-large')
plt.xlabel('Diffusion timesteps', fontsize='x-large')
plt.ylabel('Guidance parameter', fontsize='x-large')

# plot each k values
for idx,s in enumerate(const_expts):
    ax.plot(s['schedule'], c=colors[idx], label=f'k: {s["val"]:.4f}')
    
plt.legend()
plt.tight_layout();

Loading the openjourney model from Prompt Hero

The min_diffusion library loads a Stable Diffusion model from the HuggingFace hub.

# to load Stable Diffusion pipelines
from min_diffusion.core import MinimalDiffusion

# to plot generated images
from min_diffusion.utils import show_image, image_grid, plot_grid
2022-11-24 20:59:11.535760: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0

We use it to load the openjourney model on the GPU in torch.float16 precision.

model_name = 'prompthero/openjourney'
device     = 'cuda'
dtype      = torch.float16
pipeline = MinimalDiffusion(model_name, device, dtype, generator=generator)
pipeline.load();
Enabling default unet attention slicing.

Text prompt for image generations

We use the familiar, running prompt in our series to generate an image:

"a photograph of an astronaut riding a horse"

:::: {.callout-important}
The openjourney model was fine-tuned to create images in the style of Midjourney v4.

To enable this fine-tuned style, we need to add the keyword "mdjrny-v4" at the start of the prompt.
::::

# text prompt for image generations
prompt = "mdjrny-v4 style a photograph of an astronaut riding a horse"

Image parameters

Images will be generated over $50$ diffusion steps. They will have a height and width of 512 x 512 pixels.

# the number of diffusion steps
num_steps = 50

# generated image dimensions
width, height = 512, 512

Running the experiments

The run function below generates images for the given prompt.

It also stores the output images with a matching title for plotting and visualizations.

def run(prompt, schedules, guide_tfm=None, generator=None,
        show_each=False, test_run=False):
    """Runs a dynamic Classifier-free Guidance experiment. 
    
    Generates an image for the text `prompt` given all the values in `schedules`.
    Uses a Guidance Transformation class from the `cf_guidance` library.  
    Stores the output images with a matching title for plotting. 
    Optionally shows each image as its generated.
    If `test_run` is true, it runs a single schedule for testing. 
    """
    # store generated images and their title (the experiment name)
    images, titles = [], []
    
    # make sure we have a valid guidance transform
    assert guide_tfm
    print(f'Using Guidance Transform: {guide_tfm}')
    
    # optionally run a single test schedule
    if test_run:
        print(f'Running a single schedule for testing.')
        schedules = schedules[:1]
        
    # run all schedule experiments
    for i,s in enumerate(schedules):
        
        # parse out the title for the current run
        cur_title  = s['title']
        titles.append(cur_title)
        
        # create the guidance transformation 
        cur_sched = s['schedule']
        gtfm = guide_tfm({'g': cur_sched})
        
        print(f'Running experiment [{i+1} of {len(schedules)}]: {cur_title}...')
        img = pipeline.generate(prompt, gtfm, generator=generator)
        images.append(img)
        
        # optionally plot the image
        if show_each:
            show_image(img, scale=1)

    print('Done.')
    return {'images': images,
            'titles': titles}

Sweeping the $G_\text{small}$ values

Now we generate images for the range of constant $G_\text{small}$ values. Then we will check the outputs to see what a good, default value might be.

T-Normalization with $G_\text{small}$ sweep

print('Running the k-Sweep experiments...')
t_norm_res = run(prompt, const_expts, guide_tfm=TNormGuidance)
Running the k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.TNormGuidance'>
Running experiment [1 of 10]: Param: "max_val", val=0.01...
  0%|          | 0/50 [00:00
Running experiment [2 of 10]: Param: "max_val", val=0.03...
  0%|          | 0/50 [00:00
Running experiment [3 of 10]: Param: "max_val", val=0.05...
  0%|          | 0/50 [00:00
Running experiment [4 of 10]: Param: "max_val", val=0.08...
  0%|          | 0/50 [00:00
Running experiment [5 of 10]: Param: "max_val", val=0.1...
  0%|          | 0/50 [00:00
Running experiment [6 of 10]: Param: "max_val", val=0.15...
  0%|          | 0/50 [00:00
Running experiment [7 of 10]: Param: "max_val", val=0.2...
  0%|          | 0/50 [00:00
Running experiment [8 of 10]: Param: "max_val", val=0.22...
  0%|          | 0/50 [00:00
Running experiment [9 of 10]: Param: "max_val", val=0.25...
  0%|          | 0/50 [00:00
Running experiment [10 of 10]: Param: "max_val", val=0.3...
  0%|          | 0/50 [00:00
Done.

Full Normalization with $G_\text{small}$ sweep

print('Running the k-Sweep experiments...')
full_norm_res = run(prompt, const_expts, guide_tfm=FullNormGuidance)
Running the k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.FullNormGuidance'>
Running experiment [1 of 10]: Param: "max_val", val=0.01...
  0%|          | 0/50 [00:00
Running experiment [2 of 10]: Param: "max_val", val=0.03...
  0%|          | 0/50 [00:00
Running experiment [3 of 10]: Param: "max_val", val=0.05...
  0%|          | 0/50 [00:00
Running experiment [4 of 10]: Param: "max_val", val=0.08...
  0%|          | 0/50 [00:00
Running experiment [5 of 10]: Param: "max_val", val=0.1...
  0%|          | 0/50 [00:00
Running experiment [6 of 10]: Param: "max_val", val=0.15...
  0%|          | 0/50 [00:00
Running experiment [7 of 10]: Param: "max_val", val=0.2...
  0%|          | 0/50 [00:00
Running experiment [8 of 10]: Param: "max_val", val=0.22...
  0%|          | 0/50 [00:00
Running experiment [9 of 10]: Param: "max_val", val=0.25...
  0%|          | 0/50 [00:00
Running experiment [10 of 10]: Param: "max_val", val=0.3...
  0%|          | 0/50 [00:00
Done.

Results

T-Normalization $G_\text{small}$ results

#| echo: false
#| output: true
# display all images
image_grid(t_norm_res['images'], title=t_norm_res['titles'], rows=5, width=width, height=height)

Full Normalization $G_\text{small}$ results

#| echo: false
#| output: true
# display all images
image_grid(full_norm_res['images'], title=full_norm_res['titles'], rows=5, width=width, height=height)