import os
import random
from functools import partial
import torch
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
Classifier-free Guidance with Cosine Schedules Pt. 6
Combining the best schedules and normalizations so far.
Introduction
This notebook is Part 6 in a series on dynamic Classifier-free Guidance. It combines the best schedules and normalizations we’ve found so far.
Recap of Parts 1-5
The first five parts explored how to turn Classifier-free Guidance into a dynamic process. We found a good set of schedules and normalizations that seem to improve the output of diffusion image models.
Part 6: Putting it all together
Part 6 brings together our best approaches so far. Specifically, it explores the following schedules:
kDecay
with large \(k\) values.
Inverse kDecay
with small \(k\) values.
On all three Guidance normalizations:
Prediction Normalization
T-Normalization
Full Normalization
Python imports
We start with a few python imports.
Seed for reproducibility
seed_everything
makes sure that the results are reproducible across notebooks.
# set the seed and pseudo random number generator
= 1024
SEED def seed_everything(seed):
random.seed(seed)'PYTHONHASHSEED'] = str(seed)
os.environ[
np.random.seed(seed)= torch.manual_seed(seed)
generator = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark return generator
# for sampling the initial, noisy latents
= seed_everything(SEED) generator
Cosine schedules with k-decay
We create the schedules with different \(k\) values using the cf_guidance
library.
# helpers to create cosine schedules
from cf_guidance.schedules import get_cos_sched
# normalizations for classifier-free guidance
from cf_guidance.transforms import GuidanceTfm, BaseNormGuidance, TNormGuidance, FullNormGuidance
For the other schedule parameters, we keep the same values from the rest of the series. The functions below are also shared with previous notebooks.
# Default schedule parameters from the blog post
######################################
= 7.5 # guidance scaling value
max_val = 1. # minimum guidance scaling
min_val = 50 # number of diffusion steps
num_steps = 0 # number of warmup steps
num_warmup_steps = 0 # the intial warmup value
warmup_init_val = 0.5 # number of cosine cycles
num_cycles = 1 # k-decay for cosine curve scaling
k_decay
# smaller values for T-Norm and FullNorm
= 0.2
max_T = 0.01
min_T ######################################
= {
DEFAULT_COS_PARAMS 'max_val': max_val,
'num_steps': num_steps,
'min_val': min_val,
'num_cycles': num_cycles,
'k_decay': k_decay,
'num_warmup_steps': num_warmup_steps,
'warmup_init_val': warmup_init_val,
}
= {
DEFAULT_T_PARAMS 'max_val': max_T,
'num_steps': num_steps,
'min_val': min_T,
'num_cycles': num_cycles,
'k_decay': k_decay,
'num_warmup_steps': num_warmup_steps,
'warmup_init_val': warmup_init_val,
}
def cos_harness(default_params, new_params):
'''Creates cosine schedules with updated parameters in `new_params`
'''
# start from the given baseline `default_params`
= dict(default_params)
cos_params # update the with the new, given parameters
cos_params.update(new_params)
# return the new cosine schedule
= get_cos_sched(**cos_params)
sched return sched
def create_expts(params: dict, schedule_func) -> list:
'''Creates a list of experiments.
Each element is a dictionary with the name, value, and schedule for a given parameter.
A `title` field is also added for easy plotting.
'''
= sorted(params)
names = []
expts # step through parameter names and their values
for i,name in enumerate(names):
for j,val in enumerate(params[name]):
# create the experiment
= {'param_name': name,
expt 'val': val,
'schedule': schedule_func(new_params={name:val})}
# name for plotting
'title'] = f'Param: "{name}", val={val}'
expt[# add it to the experiment list
expts.append(expt)return expts
Next we create the best k-decay cosine schedules.
# create the k-decay cosine experiments
= {'k_decay': [1, 2, 5]}
k_params = partial(cos_harness, default_params=DEFAULT_COS_PARAMS)
k_func = create_expts(k_params, k_func)
k_expts
# setup for the Inverse-k-decay cosine schedules
= {'k_decay': [0.15, 0.2, 0.3, 0.5, 0.7]}
inv_k_params = partial(cos_harness, default_params=DEFAULT_COS_PARAMS)
inv_k_func = create_expts(inv_k_params, inv_k_func)
inv_k_expts
# invert the `k` schedules with small values
= []
tmp for s in inv_k_expts:
= dict(s)
new_vals = [max_val - g + min_val for g in s['schedule']]
inv 'schedule'] = inv
new_vals[
tmp.append(new_vals)= tmp
inv_k_expts
# put all schedules together
= k_expts + inv_k_expts all_k_expts
We repeat this for the T
and Full
Normalizations as well
# create the k-decay cosine experiments
= partial(cos_harness, default_params=DEFAULT_T_PARAMS)
T_k_func = create_expts(k_params, T_k_func)
T_k_expts
# create the Inverse-k-decay cosine experiments
= partial(cos_harness, default_params=DEFAULT_T_PARAMS)
T_inv_k_func = create_expts(inv_k_params, T_inv_k_func)
T_inv_k_expts
# stores the inverted schedules
= []
tmp # flip the schedules
for s in T_inv_k_expts:
= dict(s)
new_vals = [max_T - g + min_T for g in s['schedule']]
inv 'schedule'] = inv
new_vals[
tmp.append(new_vals)= tmp
T_inv_k_expts
= T_k_expts + T_inv_k_expts all_T_k_expts
Loading the Stable Diffusion v1-4
model from CompVis
The min_diffusion
library loads a Stable Diffusion model from the HuggingFace hub.
# to load Stable Diffusion pipelines
from min_diffusion.core import MinimalDiffusion
# to plot generated images
from min_diffusion.utils import show_image, image_grid, plot_grid
2022-11-25 13:55:51.399114: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
We use it to load the Stable Diffusion v1-4
model on the GPU, with torch.float16
precision.
= 'CompVis/stable-diffusion-v1-4'
model_name = 'cuda'
device = torch.float16 dtype
= MinimalDiffusion(model_name, device, dtype, generator=generator) pipeline
='ema'); pipeline.load(better_vae
Using the improved VAE "ema" from stabiliy.ai
Enabling default unet attention slicing.
/home/paperspace/mambaforge/envs/prod/lib/python3.8/site-packages/diffusers/utils/deprecation_utils.py:35: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_lms_discrete.LMSDiscreteScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
warnings.warn(warning + message, FutureWarning)
Text prompt for image generations
We use the familiar, running prompt in our series to generate an image:
“a photograph of an astronaut riding a horse”
# text prompt for image generations
= "a photograph of an astronaut riding a horse" prompt
Image parameters
Images will be generated over \(50\) diffusion steps. They will have a height and width of 512 x 512
pixels.
# the number of diffusion steps
= 50
num_steps
# generated image dimensions
= 512, 512 width, height
Running the experiments
The run
function below generates images for a given prompt
.
It also stores the output images with a matching title for plotting and visualizations.
def run(prompt, schedules, guide_tfm=None, generator=None,
=False, test_run=False):
show_each"""Runs a dynamic Classifier-free Guidance experiment.
Generates an image for the text `prompt` given all the values in `schedules`.
Uses a Guidance Transformation class from the `cf_guidance` library.
Stores the output images with a matching title for plotting.
Optionally shows each image as its generated.
If `test_run` is true, it runs a single schedule for testing.
"""
# store generated images and their title (the experiment name)
= [], []
images, titles
# make sure we have a valid guidance transform
assert guide_tfm
print(f'Using Guidance Transform: {guide_tfm}')
# optionally run a single test schedule
if test_run:
print(f'Running a single schedule for testing.')
= schedules[:1]
schedules
# run all schedule experiments
for i,s in enumerate(schedules):
# parse out the title for the current run
= s['title']
cur_title
titles.append(cur_title)
# create the guidance transformation
= s['schedule']
cur_sched = guide_tfm({'g': cur_sched})
gtfm
print(f'Running experiment [{i+1} of {len(schedules)}]: {cur_title}...')
= pipeline.generate(prompt, gtfm, generator=generator)
img
images.append(img)
# optionally plot the image
if show_each:
=1)
show_image(img, scale
print('Done.')
return {'images': images,
'titles': titles}
Creating the baseline image with \(G = 7.5\)
First we create the baseline image using a constant Classifier-free Guidance with \(G = 7.5\). Since this is a constant schedule, \(k\) does not come into play.
# create the baseline schedule with the new function
= 7.5
baseline_g = {'max_val': [baseline_g]}
baseline_params = lambda *args, **kwargs: [baseline_g for _ in range(num_steps)]
baseline_func = create_expts(baseline_params, baseline_func) baseline_expts
= run(prompt, baseline_expts, guide_tfm=GuidanceTfm) baseline_res
Using Guidance Transform: <class 'cf_guidance.transforms.GuidanceTfm'>
Running experiment [1 of 1]: Param: "max_val", val=7.5...
Done.
# view the baseline image
'images'][0] baseline_res[
Improving the baseline with schedules and normalizations
Now let’s run our kDecay
schedules with normalizations. Then we can check how it changed the baseline image.
Since every run starts from the exact same noisy latents, only the schedules and normalizations are affecting the output.
Prediction Normalization
runs
print('Running the Prediction Norm experiments...')
= run(prompt, all_k_expts, guide_tfm=BaseNormGuidance) base_norm_res
Running the Prediction Norm experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.BaseNormGuidance'>
Running experiment [1 of 8]: Param: "k_decay", val=1...
Running experiment [2 of 8]: Param: "k_decay", val=2...
Running experiment [3 of 8]: Param: "k_decay", val=5...
Running experiment [4 of 8]: Param: "k_decay", val=0.15...
Running experiment [5 of 8]: Param: "k_decay", val=0.2...
Running experiment [6 of 8]: Param: "k_decay", val=0.3...
Running experiment [7 of 8]: Param: "k_decay", val=0.5...
Running experiment [8 of 8]: Param: "k_decay", val=0.7...
Done.
T-Normalization
runs
print('Running the T-Norm experiments...')
= run(prompt, all_T_k_expts, guide_tfm=TNormGuidance) T_norm_res
Running the T-Norm experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.TNormGuidance'>
Running experiment [1 of 8]: Param: "k_decay", val=1...
Running experiment [2 of 8]: Param: "k_decay", val=2...
Running experiment [3 of 8]: Param: "k_decay", val=5...
Running experiment [4 of 8]: Param: "k_decay", val=0.15...
Running experiment [5 of 8]: Param: "k_decay", val=0.2...
Running experiment [6 of 8]: Param: "k_decay", val=0.3...
Running experiment [7 of 8]: Param: "k_decay", val=0.5...
Running experiment [8 of 8]: Param: "k_decay", val=0.7...
Done.
Full Normalization
runs
print('Running the T-Norm experiments...')
= run(prompt, all_T_k_expts, guide_tfm=FullNormGuidance) full_norm_res
Running the T-Norm experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.FullNormGuidance'>
Running experiment [1 of 8]: Param: "k_decay", val=1...
Running experiment [2 of 8]: Param: "k_decay", val=2...
Running experiment [3 of 8]: Param: "k_decay", val=5...
Running experiment [4 of 8]: Param: "k_decay", val=0.15...
Running experiment [5 of 8]: Param: "k_decay", val=0.2...
Running experiment [6 of 8]: Param: "k_decay", val=0.3...
Running experiment [7 of 8]: Param: "k_decay", val=0.5...
Running experiment [8 of 8]: Param: "k_decay", val=0.7...
Done.
Results
Prediction Normalization
results
T-Normalization
results
Full-Normalization
results
Analysis
Inverse kDecay
schedules improve the images the most. The regular kDecay
schedules also helped, but the improvements are not as drastic.
The sweet spot for Inverse kDecay
seems to be between \(0.15\) and \(0.3\). It is not fully constant throughout the normalizations either. Sometimes \(0.15\) is better than \(0.2\) and vice-versa.
When in doubt, it seems \(0.2\) is a good middle ground. Perhaps we need to explore this range further, or increase the slope of the initial kDecay
warmup.
Prediction Normalization
comparison
T-Normalization
comparison
Full Normalization
comparison
Comparing \(k = 0.15\) across normalizations
Comparing \(k = 0.2\) across normalizations
Comparing \(k = 0.3\) across normalizations
At this point, the difference in quality between \(0.15\) and \(0.2\) becomes subjective. It does seem that 0.2 makes for more stable images across the normalizations. But, 0.15 fixed the astronaut’s leg and arm.
\(0.3\) still improves the image, but we start to lose texture and coherence in the background.
Conclusion
In Part 6 of the series we combined our best schedules so far with normalizations.
We found that normalizations with an Inverse kDecay
schedule of \(k = 0.2\) or \(k = 0.15\) improved on the baseline. These schedules gave the background more details, enhanced details on the floor, improved details in the astronaut’s suit, and made the horse more anatomically correct. This confirms our explorations in previous notebooks, which showed that the Guidance scaling had to warmup quickly and/or stay high for as long as possible.
In Part 7, we will check if these gains hold across different Stable Diffusion models.