import os
import random
from functools import partial
import torch
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
Classifier-free Guidance with Cosine Schedules Pt. 5
Exploring a range of guidance values for T-Normalization.
Introduction
This notebook is Part 5 in a series on dynamic Classifier-free Guidance. It explores smaller \(G\) values for normalizations.
Recap of Parts 1-4
The first three parts explored how to turn Classifier-free Guidance into a dynamic process. We found an initial set of schedules and normalizers that seem to improve the quality of Diffusion images. We then dug in and refined a few of the most promising schedules.
Part 5: Exploring values for T-Normalization
Part 5 answers the question: what should the value of \(G_\text{small}\) be for T-Normalization
and Full Normalization
?
Recall that these two normalizations scale the update vector \(\left(t - u \right)\). That places the update vector on a different scale than the unconditioned vector \(u\). If we then scaled the update vector by a large scalar, say \(G = 7.5\), the output collapses to noise. In fact it seems to collapse to the true mode of the latent image distribution: uniform, brown values.
These two normalizations are very promising: they improve the syntax and details of the image. However, we only explored a single value of \(G_\text{small} = 0.15\). This is very different from the default \(G = 7.5\) that has been truly explored in regular Classifier-free Guidance.
This notebook tries to find a good starting point for \(G_\text{small}\), so we can try the normalizations with our best schedules so far.
Python imports
We start with a few basic python imports.
Seed for reproducibility
seed_everything
makes sure that the results are reproducible across notebooks.
# set the seed and pseudo random number generator
= 1024
SEED def seed_everything(seed):
random.seed(seed)'PYTHONHASHSEED'] = str(seed)
os.environ[
np.random.seed(seed)= torch.manual_seed(seed)
generator = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.benchmark return generator
# for sampling the initial, noisy latents
= seed_everything(SEED) generator
Constant schedules with a range of \(G_\text{small}\) values
We can try different \(G_\text{small}\) values using the cf_guidance
library.
# helpers to create cosine schedules
from cf_guidance.schedules import get_cos_sched
# normalizations for classifier-free guidance
from cf_guidance.transforms import TNormGuidance, FullNormGuidance
For the other schedule parameters, we will use the same values from the running series on dynamic Classifier-free Guidance.
# Default schedule parameters from the blog post
######################################
= 0.15 # guidance scaling value
max_val = 0.0 # minimum guidance scaling
min_val = 50 # number of diffusion steps
num_steps = 0 # number of warmup steps
num_warmup_steps = 0 # the intial warmup value
warmup_init_val = 0.5 # number of cosine cycles
num_cycles = 1 # k-decay for cosine curve scaling
k_decay ######################################
= {
DEFAULT_COS_PARAMS 'max_val': max_val,
'num_steps': num_steps,
'min_val': min_val,
'num_cycles': num_cycles,
'k_decay': k_decay,
'num_warmup_steps': num_warmup_steps,
'warmup_init_val': warmup_init_val,
}
def cos_harness(new_params={}, default_params={}):
'''Creates cosine schedules with updated parameters in `new_params`
'''
# start from the given baseline `cos_params`
= dict(default_params)
cos_params # update the schedule with any new parameters
cos_params.update(new_params)
# return the new cosine schedule
= get_cos_sched(**cos_params)
sched return sched
def create_expts(params: dict, schedule_func) -> list:
'''Creates a list of experiments.
Each element is a dictionary with the name, value, and schedule for a given parameter.
A `title` field is also added for easy plotting.
'''
= sorted(params)
names = []
expts # step through parameter names and their values
for i,name in enumerate(names):
for j,val in enumerate(params[name]):
# create the experiment
= {'param_name': name,
expt 'val': val,
'schedule': schedule_func(val)}
# 'schedule': schedule_func({name: val})}
# name for plotting
'title'] = f'Param: "{name}", val={val}'
expt[# add it to the experiment list
expts.append(expt)return expts
# create the constant G_small cosine experiments
= {'max_val': [0.01, 0.03, 0.05, 0.08, 0.1, 0.15, 0.2, 0.22, 0.25, 0.3]}
const_params = lambda val: [val for _ in range(num_steps)]
const_func = create_expts(const_params, const_func) const_expts
Plotting the \(G_\text{small}\) values
Loading the openjourney
model from Prompt Hero
The min_diffusion
library loads a Stable Diffusion model from the HuggingFace hub.
# to load Stable Diffusion pipelines
from min_diffusion.core import MinimalDiffusion
# to plot generated images
from min_diffusion.utils import show_image, image_grid, plot_grid
2022-11-24 20:59:11.535760: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
We use it to load the openjourney
model on the GPU in torch.float16
precision.
= 'prompthero/openjourney'
model_name = 'cuda'
device = torch.float16 dtype
= MinimalDiffusion(model_name, device, dtype, generator=generator) pipeline
; pipeline.load()
Enabling default unet attention slicing.
Text prompt for image generations
We use the familiar, running prompt in our series to generate an image:
“a photograph of an astronaut riding a horse”
The openjourney
model was fine-tuned to create images in the style of Midjourney v4.
To enable this fine-tuned style, we need to add the keyword "mdjrny-v4"
at the start of the prompt.
# text prompt for image generations
= "mdjrny-v4 style a photograph of an astronaut riding a horse" prompt
Image parameters
Images will be generated over \(50\) diffusion steps. They will have a height and width of 512 x 512
pixels.
# the number of diffusion steps
= 50
num_steps
# generated image dimensions
= 512, 512 width, height
Running the experiments
The run
function below generates images for the given prompt
.
It also stores the output images with a matching title for plotting and visualizations.
def run(prompt, schedules, guide_tfm=None, generator=None,
=False, test_run=False):
show_each"""Runs a dynamic Classifier-free Guidance experiment.
Generates an image for the text `prompt` given all the values in `schedules`.
Uses a Guidance Transformation class from the `cf_guidance` library.
Stores the output images with a matching title for plotting.
Optionally shows each image as its generated.
If `test_run` is true, it runs a single schedule for testing.
"""
# store generated images and their title (the experiment name)
= [], []
images, titles
# make sure we have a valid guidance transform
assert guide_tfm
print(f'Using Guidance Transform: {guide_tfm}')
# optionally run a single test schedule
if test_run:
print(f'Running a single schedule for testing.')
= schedules[:1]
schedules
# run all schedule experiments
for i,s in enumerate(schedules):
# parse out the title for the current run
= s['title']
cur_title
titles.append(cur_title)
# create the guidance transformation
= s['schedule']
cur_sched = guide_tfm({'g': cur_sched})
gtfm
print(f'Running experiment [{i+1} of {len(schedules)}]: {cur_title}...')
= pipeline.generate(prompt, gtfm, generator=generator)
img
images.append(img)
# optionally plot the image
if show_each:
=1)
show_image(img, scale
print('Done.')
return {'images': images,
'titles': titles}
Sweeping the \(G_\text{small}\) values
Now we generate images for the range of constant \(G_\text{small}\) values. Then we will check the outputs to see what a good, default value might be.
T-Normalization
with \(G_\text{small}\) sweep
print('Running the k-Sweep experiments...')
= run(prompt, const_expts, guide_tfm=TNormGuidance) t_norm_res
Running the k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.TNormGuidance'>
Running experiment [1 of 10]: Param: "max_val", val=0.01...
Running experiment [2 of 10]: Param: "max_val", val=0.03...
Running experiment [3 of 10]: Param: "max_val", val=0.05...
Running experiment [4 of 10]: Param: "max_val", val=0.08...
Running experiment [5 of 10]: Param: "max_val", val=0.1...
Running experiment [6 of 10]: Param: "max_val", val=0.15...
Running experiment [7 of 10]: Param: "max_val", val=0.2...
Running experiment [8 of 10]: Param: "max_val", val=0.22...
Running experiment [9 of 10]: Param: "max_val", val=0.25...
Running experiment [10 of 10]: Param: "max_val", val=0.3...
Done.
Full Normalization
with \(G_\text{small}\) sweep
print('Running the k-Sweep experiments...')
= run(prompt, const_expts, guide_tfm=FullNormGuidance) full_norm_res
Running the k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.FullNormGuidance'>
Running experiment [1 of 10]: Param: "max_val", val=0.01...
Running experiment [2 of 10]: Param: "max_val", val=0.03...
Running experiment [3 of 10]: Param: "max_val", val=0.05...
Running experiment [4 of 10]: Param: "max_val", val=0.08...
Running experiment [5 of 10]: Param: "max_val", val=0.1...
Running experiment [6 of 10]: Param: "max_val", val=0.15...
Running experiment [7 of 10]: Param: "max_val", val=0.2...
Running experiment [8 of 10]: Param: "max_val", val=0.22...
Running experiment [9 of 10]: Param: "max_val", val=0.25...
Running experiment [10 of 10]: Param: "max_val", val=0.3...
Done.
Results
T-Normalization
\(G_\text{small}\) results
Full Normalization
\(G_\text{small}\) results
Analysis
The sweet spot for \(G_\text{small}\) seems to be between 0.08 and 0.2. But, there are some massive changes between the values. Much more than was expected.
It seems that values below 0.08 are too small. The images never form or are too abstract. Likewise, values above 0.2 start to smear the image with weird colors. Both of these results could be ok for artistic generations, but in this case we are trying to improve the realism and quality of the images.
Phase change in the image
Most interesting, there is a “phase change” between the values of 0.08 and 0.1. The image completely changes style and pose from the previous results we’ve seen so far in the series. This phase change on its own deserves more exploration! What happens around these values of \(G_\text{small}\)?
Let’s re-run experiments focused on this range. We will pick 10 points uniformly spread between 0.08 and 0.1 to see if we can catch where the phase changes.
Re-runs to find the phase change
= 0.08
low_bound = 0.102
hi_bound = 11
npoints
= np.linspace(low_bound, hi_bound, npoints+1); points points
array([0.08 , 0.082, 0.084, 0.086, 0.088, 0.09 , 0.092, 0.094, 0.096,
0.098, 0.1 , 0.102])
# create the constant G_small cosine experiments
= {'max_val': list(points)}
phase_params = lambda val: [val for _ in range(num_steps)]
phase_func = create_expts(phase_params, phase_func) phase_expts
print('Running the phase change k-Sweep experiments...')
= run(prompt, phase_expts, guide_tfm=TNormGuidance) phase_res
Running the phase change k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.TNormGuidance'>
Running experiment [1 of 12]: Param: "max_val", val=0.08...
Running experiment [2 of 12]: Param: "max_val", val=0.082...
Running experiment [3 of 12]: Param: "max_val", val=0.084...
Running experiment [4 of 12]: Param: "max_val", val=0.086...
Running experiment [5 of 12]: Param: "max_val", val=0.088...
Running experiment [6 of 12]: Param: "max_val", val=0.09...
Running experiment [7 of 12]: Param: "max_val", val=0.092...
Running experiment [8 of 12]: Param: "max_val", val=0.094...
Running experiment [9 of 12]: Param: "max_val", val=0.096...
Running experiment [10 of 12]: Param: "max_val", val=0.09799999999999999...
Running experiment [11 of 12]: Param: "max_val", val=0.09999999999999999...
Running experiment [12 of 12]: Param: "max_val", val=0.102...
Done.
T-Normalization
with \(G_\text{small}\) phase change
It seems the phase change happens between 0.088 and 0.09. Let’s check if this is also true for Full Normalization
.
Full Normalization
with \(G_\text{small}\) phase change
print('Running the phase change k-Sweep experiments...')
= run(prompt, phase_expts, guide_tfm=FullNormGuidance) full_phase_res
Running the phase change k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.FullNormGuidance'>
Running experiment [1 of 12]: Param: "max_val", val=0.08...
Running experiment [2 of 12]: Param: "max_val", val=0.082...
Running experiment [3 of 12]: Param: "max_val", val=0.084...
Running experiment [4 of 12]: Param: "max_val", val=0.086...
Running experiment [5 of 12]: Param: "max_val", val=0.088...
Running experiment [6 of 12]: Param: "max_val", val=0.09...
Running experiment [7 of 12]: Param: "max_val", val=0.092...
Running experiment [8 of 12]: Param: "max_val", val=0.094...
Running experiment [9 of 12]: Param: "max_val", val=0.096...
Running experiment [10 of 12]: Param: "max_val", val=0.09799999999999999...
Running experiment [11 of 12]: Param: "max_val", val=0.09999999999999999...
Running experiment [12 of 12]: Param: "max_val", val=0.102...
Done.
The phase change happens in the same place! In fact the change is more pronounced, there is definitely something strange with the horse’s head as we hit the phase transition around \(G_\text{small} = 0.09\).
However, it seems that the image grows darker and less clear as we move away from the phase change. The horse’s body is less illuminated and is even hard to see.
One last check, what if the images before the phase change are better? We already saw that 0.05 was a bit too low, but what about values between 0.06 and 0.08?
Re-runs to find earlier, potentially better values
= 0.06
low_bound = 0.082
hi_bound = 11
npoints
= np.linspace(low_bound, hi_bound, npoints+1); points points
array([0.06 , 0.062, 0.064, 0.066, 0.068, 0.07 , 0.072, 0.074, 0.076,
0.078, 0.08 , 0.082])
# create the constant G_small cosine experiments
= {'max_val': list(points)}
early_phase_params = lambda val: [val for _ in range(num_steps)]
early_phase_func = create_expts(early_phase_params, early_phase_func) early_phase_expts
print('Running the phase change k-Sweep experiments...')
= run(prompt, early_phase_expts, guide_tfm=TNormGuidance) early_phase_res
Running the phase change k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.TNormGuidance'>
Running experiment [1 of 12]: Param: "max_val", val=0.06...
Running experiment [2 of 12]: Param: "max_val", val=0.062...
Running experiment [3 of 12]: Param: "max_val", val=0.064...
Running experiment [4 of 12]: Param: "max_val", val=0.066...
Running experiment [5 of 12]: Param: "max_val", val=0.068...
Running experiment [6 of 12]: Param: "max_val", val=0.07...
Running experiment [7 of 12]: Param: "max_val", val=0.07200000000000001...
Running experiment [8 of 12]: Param: "max_val", val=0.074...
Running experiment [9 of 12]: Param: "max_val", val=0.076...
Running experiment [10 of 12]: Param: "max_val", val=0.078...
Running experiment [11 of 12]: Param: "max_val", val=0.08...
Running experiment [12 of 12]: Param: "max_val", val=0.082...
Done.
This is a mixed bag. The anatomy of the horse’s face is better at lower values, but the rest of the image gains some strange artifacts. For example the horse’s leg starts fraying, and the astronaut merges with horse’s body. It is safe to say we are in “too small” territory for \(G_\text{small}.\)
The final check: \(G\ \) from \(0.1\) to \(0.2\)
There is a noticeable change in the image somewhere between 0.1 and 0.2. Let’s do a small sweep in this range to see what happens.
= 0.1
low_bound = 0.202
hi_bound = 11
npoints
= np.linspace(low_bound, hi_bound, npoints+1); points points
array([0.1 , 0.10927273, 0.11854545, 0.12781818, 0.13709091,
0.14636364, 0.15563636, 0.16490909, 0.17418182, 0.18345455,
0.19272727, 0.202 ])
# create the constant G_small cosine experiments
= {'max_val': list(points)}
late_phase_params = lambda val: [val for _ in range(num_steps)]
late_phase_func = create_expts(late_phase_params, late_phase_func) late_phase_expts
print('Running the phase change k-Sweep experiments...')
= run(prompt, late_phase_expts, guide_tfm=TNormGuidance) late_phase_res
Running the phase change k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.TNormGuidance'>
Running experiment [1 of 12]: Param: "max_val", val=0.1...
Running experiment [2 of 12]: Param: "max_val", val=0.10927272727272727...
Running experiment [3 of 12]: Param: "max_val", val=0.11854545454545455...
Running experiment [4 of 12]: Param: "max_val", val=0.12781818181818183...
Running experiment [5 of 12]: Param: "max_val", val=0.1370909090909091...
Running experiment [6 of 12]: Param: "max_val", val=0.14636363636363636...
Running experiment [7 of 12]: Param: "max_val", val=0.15563636363636363...
Running experiment [8 of 12]: Param: "max_val", val=0.16490909090909092...
Running experiment [9 of 12]: Param: "max_val", val=0.1741818181818182...
Running experiment [10 of 12]: Param: "max_val", val=0.18345454545454545...
Running experiment [11 of 12]: Param: "max_val", val=0.19272727272727275...
Running experiment [12 of 12]: Param: "max_val", val=0.202...
Done.
Another phase change! Almost exactly at twice the \(G_\text{small}\) from the previous phase change. And, after this second phase change, we re-gain some illumination on the horse’s body.
Checking for phase change multiples
Will we find another phase change around three times from the first one? Let’s find out.
= 0.25
low_bound = 0.29
hi_bound = 11
npoints
= np.linspace(low_bound, hi_bound, npoints+1); points points
array([0.25 , 0.25363636, 0.25727273, 0.26090909, 0.26454545,
0.26818182, 0.27181818, 0.27545455, 0.27909091, 0.28272727,
0.28636364, 0.29 ])
# create the constant G_small cosine experiments
= {'max_val': list(points)}
later_phase_params = lambda val: [val for _ in range(num_steps)]
later_phase_func = create_expts(later_phase_params, later_phase_func) later_phase_expts
print('Running the phase change k-Sweep experiments...')
= run(prompt, later_phase_expts, guide_tfm=TNormGuidance) later_phase_res
Running the phase change k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.TNormGuidance'>
Running experiment [1 of 12]: Param: "max_val", val=0.25...
Running experiment [2 of 12]: Param: "max_val", val=0.25363636363636366...
Running experiment [3 of 12]: Param: "max_val", val=0.25727272727272726...
Running experiment [4 of 12]: Param: "max_val", val=0.2609090909090909...
Running experiment [5 of 12]: Param: "max_val", val=0.26454545454545453...
Running experiment [6 of 12]: Param: "max_val", val=0.2681818181818182...
Running experiment [7 of 12]: Param: "max_val", val=0.2718181818181818...
Running experiment [8 of 12]: Param: "max_val", val=0.27545454545454545...
Running experiment [9 of 12]: Param: "max_val", val=0.27909090909090906...
Running experiment [10 of 12]: Param: "max_val", val=0.2827272727272727...
Running experiment [11 of 12]: Param: "max_val", val=0.2863636363636364...
Running experiment [12 of 12]: Param: "max_val", val=0.29...
Done.
There is no clear phase change, but the image is starting to fall apart. It is safe to say we are in territory where \(G_\text{small}\) is too large.
\(G_\text{small} = 0.25\) is the last image where we have a fully correct, non-smeared astronaut.
Cosines for the best \(G_\text{small}\)
We saw that that \(G_\text{small}\) should be between 0.08 and 0.25. What happens if we try cosine schedules around these values?
def create_cos_expts(params: dict, schedule_func) -> list:
'''Creates a list of experiments.
Each element is a dictionary with the name, value, and schedule for a given parameter.
A `title` field is also added for easy plotting.
'''
= sorted(params)
names = []
expts # step through parameter names and their values
for i,name in enumerate(names):
for j,val in enumerate(params[name]):
# create the experiment
= {'param_name': name,
expt 'val': val,
'schedule': schedule_func({name: val})}
# 'schedule': schedule_func({name: val})}
# name for plotting
'title'] = f'Param: "{name}", val={val}'
expt[# add it to the experiment list
expts.append(expt)return expts
Let’s sweep a few cosine schedules between the ideal \(G_\text{small}\) range.
= 0.08
low_cos_bound = 0.25
hi_cos_bound = 11
npoints = np.linspace(low_cos_bound, hi_cos_bound, npoints+1); cos_points cos_points
array([0.08 , 0.09545455, 0.11090909, 0.12636364, 0.14181818,
0.15727273, 0.17272727, 0.18818182, 0.20363636, 0.21909091,
0.23454545, 0.25 ])
# create the constant G_small cosine experiments
= {'max_val': list(cos_points)}
cos_params = partial(cos_harness, default_params=DEFAULT_COS_PARAMS)
cos_func = create_cos_expts(cos_params, cos_func) cos_expts
print('Running the phase change k-Sweep experiments...')
= run(prompt, cos_expts, guide_tfm=TNormGuidance) cos_res
Running the phase change k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.TNormGuidance'>
Running experiment [1 of 12]: Param: "max_val", val=0.08...
Running experiment [2 of 12]: Param: "max_val", val=0.09545454545454546...
Running experiment [3 of 12]: Param: "max_val", val=0.1109090909090909...
Running experiment [4 of 12]: Param: "max_val", val=0.12636363636363634...
Running experiment [5 of 12]: Param: "max_val", val=0.14181818181818182...
Running experiment [6 of 12]: Param: "max_val", val=0.1572727272727273...
Running experiment [7 of 12]: Param: "max_val", val=0.17272727272727273...
Running experiment [8 of 12]: Param: "max_val", val=0.18818181818181817...
Running experiment [9 of 12]: Param: "max_val", val=0.20363636363636362...
Running experiment [10 of 12]: Param: "max_val", val=0.21909090909090906...
Running experiment [11 of 12]: Param: "max_val", val=0.23454545454545456...
Running experiment [12 of 12]: Param: "max_val", val=0.25...
Done.
For completion, let’s try the Cosine sweep on Full Normalization
print('Running the phase change k-Sweep experiments...')
= run(prompt, cos_expts, guide_tfm=FullNormGuidance) full_norm_cos_res
Running the phase change k-Sweep experiments...
Using Guidance Transform: <class 'cf_guidance.transforms.FullNormGuidance'>
Running experiment [1 of 12]: Param: "max_val", val=0.08...
Running experiment [2 of 12]: Param: "max_val", val=0.09545454545454546...
Running experiment [3 of 12]: Param: "max_val", val=0.1109090909090909...
Running experiment [4 of 12]: Param: "max_val", val=0.12636363636363634...
Running experiment [5 of 12]: Param: "max_val", val=0.14181818181818182...
Running experiment [6 of 12]: Param: "max_val", val=0.1572727272727273...
Running experiment [7 of 12]: Param: "max_val", val=0.17272727272727273...
Running experiment [8 of 12]: Param: "max_val", val=0.18818181818181817...
Running experiment [9 of 12]: Param: "max_val", val=0.20363636363636362...
Running experiment [10 of 12]: Param: "max_val", val=0.21909090909090906...
Running experiment [11 of 12]: Param: "max_val", val=0.23454545454545456...
Running experiment [12 of 12]: Param: "max_val", val=0.25...
Done.
Conclusion
This was Part 5 in our series on dynamic Classifier-free Guidance.
We found a good range of \(G_\text{small}\) for both T
and Full
Normalizations. We also tried some basic Cosine Schedules around this ideal range.
In Part 6, we will explore the best schedules for each type of normalization:
- Prediction Normalization
- T-Normalization
- Full Normalization
Specifically, we will plug in the best kDecay
Cosine Schedules so far. At that point, we should be able to see a measure and consistent improvement from the original, constant Guidance.