Part 7 of 9
Exploring a range of guidance values for T-Normalization.
This notebook is Part 5 in a series on dynamic Classifier-free Guidance. It explores smaller $G$ values for normalizations.
The first three parts explored how to turn Classifier-free Guidance into a dynamic process. We found an initial set of schedules and normalizers that seem to improve the quality of Diffusion images. We then dug in and refined a few of the most promising schedules.
T-Normalization
Part 5 answers the question: what should the value of $G_\text{small}$ be for T-Normalization
and Full Normalization
?
Recall that these two normalizations scale the update vector $\left(t - u \right)$. That places the update vector on a different scale than the unconditioned vector $u$. If we then scaled the update vector by a large scalar, say $G = 7.5$, the output collapses to noise. In fact it seems to collapse to the true mode of the latent image distribution: uniform, brown values.
These two normalizations are very promising: they improve the syntax and details of the image. However, we only explored a single value of $G_\text{small} = 0.15$. This is very different from the default $G = 7.5$ that has been truly explored in regular Classifier-free Guidance.
This notebook tries to find a good starting point for $G_\text{small}$, so we can try the normalizations with our best schedules so far.
We start with a few basic python imports.
import os
import random
from functools import partial
import torch
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
seed_everything
makes sure that the results are reproducible across notebooks.
# set the seed and pseudo random number generator
SEED = 1024
def seed_everything(seed):
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
generator = torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
return generator
# for sampling the initial, noisy latents
generator = seed_everything(SEED)
We can try different $G_\text{small}$ values using the cf_guidance
library.
# helpers to create cosine schedules
from cf_guidance.schedules import get_cos_sched
# normalizations for classifier-free guidance
from cf_guidance.transforms import TNormGuidance, FullNormGuidance
For the other schedule parameters, we will use the same values from the running series on dynamic Classifier-free Guidance.
# Default schedule parameters from the blog post
######################################
max_val = 0.15 # guidance scaling value
min_val = 0.0 # minimum guidance scaling
num_steps = 50 # number of diffusion steps
num_warmup_steps = 0 # number of warmup steps
warmup_init_val = 0 # the intial warmup value
num_cycles = 0.5 # number of cosine cycles
k_decay = 1 # k-decay for cosine curve scaling
######################################
DEFAULT_COS_PARAMS = {
'max_val': max_val,
'num_steps': num_steps,
'min_val': min_val,
'num_cycles': num_cycles,
'k_decay': k_decay,
'num_warmup_steps': num_warmup_steps,
'warmup_init_val': warmup_init_val,
}
def cos_harness(new_params={}, default_params={}):
'''Creates cosine schedules with updated parameters in `new_params`
'''
# start from the given baseline `cos_params`
cos_params = dict(default_params)
# update the schedule with any new parameters
cos_params.update(new_params)
# return the new cosine schedule
sched = get_cos_sched(**cos_params)
return sched
def create_expts(params: dict, schedule_func) -> list:
'''Creates a list of experiments.
Each element is a dictionary with the name, value, and schedule for a given parameter.
A `title` field is also added for easy plotting.
'''
names = sorted(params)
expts = []
# step through parameter names and their values
for i,name in enumerate(names):
for j,val in enumerate(params[name]):
# create the experiment
expt = {'param_name': name,
'val': val,
'schedule': schedule_func(val)}
# 'schedule': schedule_func({name: val})}
# name for plotting
expt['title'] = f'Param: "{name}", val={val}'
# add it to the experiment list
expts.append(expt)
return expts
# create the constant G_small cosine experiments
const_params = {'max_val': [0.01, 0.03, 0.05, 0.08, 0.1, 0.15, 0.2, 0.22, 0.25, 0.3]}
const_func = lambda val: [val for _ in range(num_steps)]
const_expts = create_expts(const_params, const_func)
#| echo: false
#| output: true
colors=list(mcolors.TABLEAU_COLORS)
# setup the plot
fig,ax = plt.subplots(figsize=(12,8))
plt.title('Constant Schedules for G_small', fontsize='xx-large')
plt.xlabel('Diffusion timesteps', fontsize='x-large')
plt.ylabel('Guidance parameter', fontsize='x-large')
# plot each k values
for idx,s in enumerate(const_expts):
ax.plot(s['schedule'], c=colors[idx], label=f'k: {s["val"]:.4f}')
plt.legend()
plt.tight_layout();
openjourney
model from Prompt HeroThe min_diffusion
library loads a Stable Diffusion model from the HuggingFace hub.
# to load Stable Diffusion pipelines
from min_diffusion.core import MinimalDiffusion
# to plot generated images
from min_diffusion.utils import show_image, image_grid, plot_grid
We use it to load the openjourney
model on the GPU in torch.float16
precision.
model_name = 'prompthero/openjourney'
device = 'cuda'
dtype = torch.float16
pipeline = MinimalDiffusion(model_name, device, dtype, generator=generator)
pipeline.load();
We use the familiar, running prompt in our series to generate an image:
"a photograph of an astronaut riding a horse"
:::: {.callout-important}
The openjourney
model was fine-tuned to create images in the style of Midjourney v4.
To enable this fine-tuned style, we need to add the keyword "mdjrny-v4"
at the start of the prompt.
::::
# text prompt for image generations
prompt = "mdjrny-v4 style a photograph of an astronaut riding a horse"
Images will be generated over $50$ diffusion steps. They will have a height and width of 512 x 512
pixels.
# the number of diffusion steps
num_steps = 50
# generated image dimensions
width, height = 512, 512
The run
function below generates images for the given prompt
.
It also stores the output images with a matching title for plotting and visualizations.
def run(prompt, schedules, guide_tfm=None, generator=None,
show_each=False, test_run=False):
"""Runs a dynamic Classifier-free Guidance experiment.
Generates an image for the text `prompt` given all the values in `schedules`.
Uses a Guidance Transformation class from the `cf_guidance` library.
Stores the output images with a matching title for plotting.
Optionally shows each image as its generated.
If `test_run` is true, it runs a single schedule for testing.
"""
# store generated images and their title (the experiment name)
images, titles = [], []
# make sure we have a valid guidance transform
assert guide_tfm
print(f'Using Guidance Transform: {guide_tfm}')
# optionally run a single test schedule
if test_run:
print(f'Running a single schedule for testing.')
schedules = schedules[:1]
# run all schedule experiments
for i,s in enumerate(schedules):
# parse out the title for the current run
cur_title = s['title']
titles.append(cur_title)
# create the guidance transformation
cur_sched = s['schedule']
gtfm = guide_tfm({'g': cur_sched})
print(f'Running experiment [{i+1} of {len(schedules)}]: {cur_title}...')
img = pipeline.generate(prompt, gtfm, generator=generator)
images.append(img)
# optionally plot the image
if show_each:
show_image(img, scale=1)
print('Done.')
return {'images': images,
'titles': titles}
Now we generate images for the range of constant $G_\text{small}$ values. Then we will check the outputs to see what a good, default value might be.
T-Normalization
with $G_\text{small}$ sweepprint('Running the k-Sweep experiments...')
t_norm_res = run(prompt, const_expts, guide_tfm=TNormGuidance)
Full Normalization
with $G_\text{small}$ sweepprint('Running the k-Sweep experiments...')
full_norm_res = run(prompt, const_expts, guide_tfm=FullNormGuidance)
T-Normalization
$G_\text{small}$ results#| echo: false
#| output: true
# display all images
image_grid(t_norm_res['images'], title=t_norm_res['titles'], rows=5, width=width, height=height)
Full Normalization
$G_\text{small}$ results#| echo: false
#| output: true
# display all images
image_grid(full_norm_res['images'], title=full_norm_res['titles'], rows=5, width=width, height=height)