# helpers to create cosine schedules
from cf_guidance.schedules import get_cos_sched
# normalizations for classifier-free guidance
from cf_guidance.transforms import GuidanceTfm, BaseNormGuidance, TNormGuidance
Dynamic Classifier-free Guidance Pt. 1
Experiments with cosine schedules for Classifier-free Guidance.
Introduction
This notebook continues previous experiments on dynamically changing the Classifier-free Guidance parameter.
To recap the earlier results: changing the guidance parameter \(G\) improved the quality of generated images. More specifically:
- Normalizing the guidance helped the image’s syntax.
- Scheduling the guidance improved the image details.
The combination of the two often makes for even better generations.
However, there is an open question about which dynamic changes are the best. The goal of this series is to, ideally, find changes that universally improve the quality of Diffusion images.
The cf_guidance
library
Guidance schedules and normalizers are now available in the cf_guidance library!
We will use this library to generate images for a sweep of cosine schedules.
Experiment Setup
The following section setups up the imports and helpers we need for the runs.
First we import the needed python modules.
Imports
import os
import math
import random
import warnings
from PIL import Image
from typing import List
from pathlib import Path
from types import SimpleNamespace
from fastcore.all import L
import numpy as np
import matplotlib.pyplot as plt
from textwrap import wrap
from tqdm.auto import tqdm
# imports for diffusion models
import torch
from transformers import logging
from transformers import CLIPTextModel, CLIPTokenizer
from huggingface_hub import notebook_login
from diffusers import StableDiffusionPipeline
from diffusers import AutoencoderKL, UNet2DConditionModel
from diffusers import LMSDiscreteScheduler
# for clean outputs
"ignore")
warnings.filterwarnings(
logging.set_verbosity_error()
# set the hardware device
= "cuda" if torch.cuda.is_available() else "mps" if torch.has_mps else "cpu" device
2022-11-21 19:06:22.967865: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
Helper functions
The functions below help with:
- Generating text embeddings from a given prompt.
- Converting Diffusion latents to a PIL image.
- Plotting the images to visualize results.
def text_embeddings(prompts, maxlen=None):
"Extracts text embeddings from the given `prompts`."
= maxlen or tokenizer.model_max_length
maxlen = tokenizer(prompts, padding="max_length", max_length=maxlen, truncation=True, return_tensors="pt")
inp return text_encoder(inp.input_ids.to(device))[0]
def image_from_latents(latents):
"Scales diffusion `latents` and turns them into a PIL Image."
# scale and decode the latents
= 1 / 0.18215 * latents
latents with torch.no_grad():
= vae.decode(latents).sample[0]
data
# Create PIL image
= (data / 2 + 0.5).clamp(0, 1)
data = data.cpu().permute(1, 2, 0).float().numpy()
data = (data * 255).round().astype("uint8")
data = Image.fromarray(data)
image return image
def show_image(image, scale=0.5):
"Displays the given `image` resized based on `scale`."
= image.resize(((int)(image.width * scale), (int)(image.height * scale)))
img
display(img)return img
def image_grid(images, rows = 1, width=256, height=256, title=None):
"Display an array of images in a grid with the given number of `rows`"
= len(images)
count = int(count / rows)
cols if cols * rows < count:
+= 1
rows # Calculate fig size based on individual image sizes
= 1/plt.rcParams['figure.dpi']
px = cols * width * px
w # Add some extra space for the caption/title since that can wrap
= (rows * height * px) + (rows * 30 * px)
h = plt.subplots(rows, cols, figsize=(w, h))
fig, axes for y in range(rows):
for x in range(cols):
= y*cols + x
index = axes[x] if rows == 1 else axes[y] if cols == 1 else axes[y, x]
ref 'off')
ref.axis(if index > count - 1:
continue
= images[index]
img = f'Frame: {index}'
txt if title is not None:
if isinstance(title, str):
= f'{title}: {index}'
txt elif isinstance(title, List):
= title[index]
txt # small change for bigger, more visible titles
= '\n'.join(wrap(txt, width=70))
txt ='x-large')
ref.set_title(txt, fontsize
ref.imshow(img)'off')
ref.axis(
Loading a Diffusion pipeline
We need to dynamically change the diffusion guidance parameter \(G\).
That means we need more control than what is available in the high-level HuggingFace APIs. To achieve this control, we load each piece of a Diffusion pipeline separately. Then, we can write our own image generation loop with full control over \(G\).
The get_sd_pieces
function loads and returns the separate components of a Stable Diffusion pipeline.
def get_sd_pieces(model_name, dtype=torch.float32, better_vae='ema'):
"Loads and returns the individual pieces in a Diffusion pipeline."
# create the tokenizer and text encoder
= CLIPTokenizer.from_pretrained(
tokenizer
model_name,="tokenizer",
subfolder=dtype)
torch_dtype= CLIPTextModel.from_pretrained(
text_encoder
model_name,="text_encoder",
subfolder=dtype).to(device)
torch_dtype
# we are using a VAE from stability that was trained for longer than the baseline
if better_vae:
assert better_vae in ('ema', 'mse')
= AutoencoderKL.from_pretrained(f"stabilityai/sd-vae-ft-{better_vae}", torch_dtype=dtype).to(device)
vae else:
= AutoencoderKL.from_pretrained(model_name, subfolder='vae', torch_dtype=dtype).to(device)
vae
# build the unet
= UNet2DConditionModel.from_pretrained(
unet
model_name,="unet",
subfolder=dtype).to(device)
torch_dtype
# enable unet attention slicing
= unet.config.attention_head_dim // 2
slice_size
unet.set_attention_slice(slice_size)
# build the scheduler
= LMSDiscreteScheduler.from_config(model_name, subfolder="scheduler")
scheduler
return (
tokenizer,
text_encoder,
vae,
unet,
scheduler, )
Picking a model
These runs use the openjourney
model from Prompt Hero.
openjourney
was fine-tuned to create images in the style of Midjourney v4.
To trigger this style, we need to add the special keyword "mdjrny-v4"
at the front of an input text prompt.
# set the diffusion model
= "prompthero/openjourney"
model_name
# other possible models
# model_name = "CompVis/stable-diffusion-v1-4"
# model_name = "runwayml/stable-diffusion-v1-5"
Next we use the function get_sd_pieces
to load this model. The pieces are loaded in float16
precision.
# set the data type for the pipeline
= torch.float16
dtype
# load the individual diffusion pieces
= get_sd_pieces(model_name, dtype=dtype)
pieces = pieces (tokenizer, text_encoder, vae, unet, scheduler)
Text prompt for generations
We use the same input text prompt from the previous notebook:
“a photograph of an astronaut riding a horse”
But, we add the special prefix keyword "mdjrny-v4"
to create Midjourney-style images.
# input text prompt for diffusion models
= "mdjrny-v4 style a photograph of an astronaut riding a horse" prompt
Generating images
The images will be generated over \(50\) diffusion steps. They will have a height and width size of 512 x 512
pixels.
# the number of diffusion steps
= 50
num_steps
# generated image dimensions
= 512, 512 width, height
Creating a fixed starting point for diffusion
The code below creates an initial set of latent noise.
The idea is for every generation to start from this shared, fixed noise. That way we can be sure that only our guidance changes are having an effect on the output image.
# create the shared, initial latents
= 1024
seed
torch.manual_seed(seed)= torch.randn((1, unet.in_channels, height//8, width//8), dtype=unet.dtype, device=device) init_latents
Image generation function
Below is the main image generation function: generate
. It uses the Stable Diffusion components we loaded earlier.
Note that this function is almost identical to the StableDiffusionPipeline
from HuggingFace. The main difference is plugging in our Guidance Transform instead of doing the default Classifier-free Guidance update.
def generate(prompt, guide_tfm=None, width=width, height=height, steps=num_steps, **kwargs):
# make sure we have a guidance transformation
assert guide_tfm
# prepare the text embeddings
= text_embeddings(prompt)
text = text_embeddings('')
uncond = torch.cat([uncond, text]).type(unet.dtype)
emb
# start from the shared, initial latents
= torch.clone(init_latents)
latents
scheduler.set_timesteps(steps)= latents * scheduler.init_noise_sigma
latents
# run the diffusion process
for i,ts in enumerate(tqdm(scheduler.timesteps)):
= scheduler.scale_model_input(torch.cat([latents] * 2), ts)
inp with torch.no_grad():
= ts
tf if torch.has_mps:
= ts.type(torch.float32)
tf = unet(inp, tf, encoder_hidden_states=emb).sample.chunk(2)
u,t
# call the guidance transform
= guide_tfm(u, t, idx=i)
pred
# update the latents
= scheduler.step(pred, ts, latents).prev_sample
latents
# decode and return the final latents
= image_from_latents(latents)
image return image
Function to run the experiments
The run
function below generates images for a given prompt.
It takes an argument guide_tfm
for the specific GuidanceTfm
class that will guide the outputs.
The schedules
argument has the set of schedules to sweep for \(G\).
def run(prompt, schedules, guide_tfm=None, show_each=True, test_run=False):
# store generated images and their title (the experiment name)
= [], []
images, titles
assert guide_tfm
print(f'Using Guidance Transform: {guide_tfm}')
if test_run:
print(f'Running a single schedule for testing.')
= schedules[:1]
schedules
= len(schedules)
ns # run all schedule experiments
for i,s in enumerate(schedules):
# parse out the title for the current run
= s['title']
cur_title
titles.append(cur_title)
# create the guidance transformation
= s['schedule']
cur_sched = guide_tfm({'g': cur_sched})
gtfm
print(f'Running experiment [{i+1} of {ns}]: {cur_title}...')
= generate(prompt, gtfm)
img
images.append(img)
# optionally plot the image
if show_each:
=1)
show_image(img, scale
print('Done.')
return {
'images': images,
'titles': titles,
}
The Baseline: Guidance with a constant \(G =7.5\)
We need a good baseline to find out exactly how dynamic guidances affect the output images.
Below we create the baseline Classifier-free Guidance with a static, constant update of \(G = 7.5\).
The code below seems like a lot of overhead, but it will be very helpful to sweep a variety of cosine schedules later on.
# create the baseline Classifier-free Guidance
= {
baseline_run 'max_val': [7.5],
}
# parameters we are sweeping
= sorted(list(baseline_run))
baselines_names = L()
baseline_scheds
# step through each parameter
for idx,name in enumerate(baselines_names):
# step through each of its values
for idj,val in enumerate(baseline_run[name]):
# create the baseline experimeent
= {
expt 'param_name': name,
'val': val,
'schedule': [val for _ in range(num_steps)]
}# for plotting
'title'] = f'Param: "{name}", val={val}'
expt[
# add to the running list of experiments
baseline_scheds.append(expt)
Let’s create the baseline image. The hope is that our guidance changes can then improve on it.
= run(prompt, baseline_scheds, guide_tfm=GuidanceTfm, show_each=True) basline_res
Using Guidance Transform: <class 'cf_guidance.transforms.GuidanceTfm'>
Running experiment [1 of 1]: Param: "max_val", val=7.5...
Done.
Not bad for a starting point. Let’s see if we can do better.
Trying to improve the baseline
We now use the cf_guidance library to create a variety of Cosine schedules.
From the previous notebook, we know that a regular Cosine schedule can improve image generations. The goal now is to figure out exactly what makes for a good Cosine schedule.
This is a hard question to answer. We can start by trying incremental, minimum-pair changes to a baseline schedule. Then we check what effects, if any, our dynamic guidances have on the generated images.
The recent trend in Diffusion variance schedules is to add noise in a slower, more gradual way. This change seems to improve both model training and inference generations.
We will keep this in mind in future notebooks as we explore more schedules.
Once we have a sense of which schedules work (and which don’t), we can explore the scheduling space in a better, more principled way.
Default schedule parameters
We start from the guidance schedule value from the previous notebook.
Recall that there were three kinds of schedules:
- A static schedule with a constant \(G\).
- A decreasing Cosine schedule.
- A Cosine schedule with some initial warm up steps.
# Default schedule parameters from the blog post
######################################
= 7.5 # guidance scaling value
max_val = 1 # minimum guidance scaling
min_val = 50 # number of diffusion steps
num_steps = 0 # number of warmup steps
num_warmup_steps = 0 # the intial warmup value
warmup_init_val = 0.5 # number of cosine cycles
num_cycles = 1 # k-decay for cosine curve scaling
k_decay ######################################
To make sure our changes always reference this shared starting point, we can wrap these parameters in a dictionary.
= {
DEFAULT_COS_PARAMS 'max_val': max_val,
'num_steps': num_steps,
'min_val': min_val,
'num_cycles': num_cycles,
'k_decay': k_decay,
'num_warmup_steps': num_warmup_steps,
'warmup_init_val': warmup_init_val,
}
Then, every minimum-pair change will start from this shared dictionary and update a single parameter. The cos_harness
below gives us an easy way of making these minimum-pair changes.
def cos_harness(new_params={}, cos_params=DEFAULT_COS_PARAMS):
'''Creates cosine schedules with updated parameters in `new_params`'''
# start from the given baseline `cos_params`
= dict(cos_params)
cos_params
# update the schedule with any new parameters
if new_params: cos_params.update(new_params)
# return the new cosine schedule
= get_cos_sched(**cos_params)
sched return sched
Let’s use the cosine harness to plot three test schedules, just to make sure things are working:
- The baseline with no warmup.
- Warmup for 5 steps.
- Warmup for 10 steps.
The schedule plotting function plot_schedules
is available in the post’s notebook.
# plot cosine schedules with different number of warmup steps
= (0, 5, 10)
warmup_steps = L(
warm_g 'sched': cos_harness({'num_warmup_steps': w}),
{'title': f'Warmup Steps: {w}'}
for w in warmup_steps
)
# plot the schedules
print('Plotting sample cosine schedules...')
'sched'), rows=1, titles=warm_g.itemgot('title')) plot_schedules(warm_g.itemgot(
Plotting sample cosine schedules...
Creating the Cosine experiments
Now we can create the different Cosine schedules that will be swept.
= {
cos_param_sweep 'num_warmup_steps': [5, 10, 15],
'num_cycles': [1, 1.5],
'k_decay': [0.8, 0.6],
'max_val': [10],
'min_val': [3],
}
= sorted(list(cos_param_sweep))
param_names
= L()
cos_scheds for idx,name in enumerate(param_names):
for idj,val in enumerate(cos_param_sweep[name]):
# create the cosine experimeent
= {
expt 'param_name': name,
'val': val,
'schedule': cos_harness({name: val})
}# for plotting
'title'] = f'Param: "{name}", val={val}'
expt[
cos_scheds.append(expt)
'schedule'), rows=3, titles=cos_scheds.itemgot('title')) plot_schedules(cos_scheds.itemgot(
Running the cosine experiments
We use the run
function from before to run all of the cosine experiments.
= run(prompt, cos_scheds, guide_tfm=GuidanceTfm, show_each=False) cos_res
Using Guidance Transform: <class 'cf_guidance.transforms.GuidanceTfm'>
Running experiment [1 of 9]: Param: "k_decay", val=0.8...
Running experiment [2 of 9]: Param: "k_decay", val=0.6...
Running experiment [3 of 9]: Param: "max_val", val=10...
Running experiment [4 of 9]: Param: "min_val", val=3...
Running experiment [5 of 9]: Param: "num_cycles", val=1...
Running experiment [6 of 9]: Param: "num_cycles", val=1.5...
Running experiment [7 of 9]: Param: "num_warmup_steps", val=5...
Running experiment [8 of 9]: Param: "num_warmup_steps", val=10...
Running experiment [9 of 9]: Param: "num_warmup_steps", val=15...
Done.
Results
Going through the images, we can spot a few patterns.
Changing the minimum value of \(G\) does not have a huge effect. However, changing the max value of \(G\) to \(10\) actively hurt the image.
It seems that going through more cosine cycles improved the horse’s anatomy. It now has two hind legs and its body looks more proportional.
The number of warmup steps has a mixed effect. The lower value of \(5\) and \(10\) produce… interesting outputs with morphed horses. If it weren’t for the floating horse head at 10 warmup steps, it would be a solid improvement. At \(15\) warmup steps we get a lovely image with a nice, detailed background..
Analysis
Certain Cosine schedules seem promising. They either increase the details of the astronaut or background, or they create more anatomically correct horses.
In the rest of the series, we will explore the promising Cosine changes:
- Setting a higher Guidance ceiling.
- Allowing the Cosine to go through multiple cycles.
- Warming up for a few steps.
Bringing in Normalizations
In the previous notebooks, we found that normalization can have a huge improvement on generated images. The next logical step is to add normalizations to our schedules to see if the gains compound.
Conclusion
This notebook used the cf_guidance
library to run a set of Guidance experiments. We swept a variety of Cosine schedules and compared the results to a baseline generation.
We showed that the guidance schedule has a big impact on the quality and syntax of generated images. We also found a set of Cosine schedules with the potential to improve generated images.
In the next part of this series, we will combine cosine schedules with normalizations.