15. Score-based Model 3

Posted Dec 5, 2025

By Mingyu An

4 min read

Controlling the Generation Process

Can we control the generation process?
$P(x)$ $\downarrow$
Inverse distribution: $P(x y)$
Bayes’ Rule:
- $p(x y) = \frac{p(x) \cdot p(y x)}{p(y)}$
Score function decomposition:
- $\nabla_x \log P(x y) = \nabla_x \log P(x) + \nabla_x \log P(y x)$
- $= \nabla_x \log P(x) + \nabla_x \log P(y x)$
  - $\approx S_\theta(x)$ (Unconditional Score)
  - Classifier term
$\Rightarrow$ $p(y x)$지만 learn 하면 되는데, 이건 사실 image classifier이다.
$p(y)$는 training 없이 specify 될 수 있다. 그냥 하면 됨.
Conclusion: Score-based gen model 하나면 다 된다.
- Classifier + unconditional gen model $P(x)$ (inverse problem)

Classifier Guidance

$P(x)$
- ↳ GAN 성능을 위해 classifier 강화 필요
DDPM은 unconditional gen만 됨. label 같이 줘도 X.
- ↳ Adversarial 하지 않으면서 discriminator를 만들자.
$\nabla_x \log P(y x)$ 가 그 역할을 하겠다!

Basic Idea

Goal: $\nabla_{x_t} \log P(x_t y)$ 를 learn하자. (원래는 $\nabla_{x} \log P(x_t) \approx S_\theta(x_t, t)$)
- ↳ $\nabla \log P(x_t y) = \nabla \log (\frac{P(y x_t) \cdot P(x_t)}{P(y)})$
- $\rightarrow$ Classifier guide
- $= \nabla_x \log P(x_t) + \nabla \log P(y x_t)$
- 그림: 같은 방향으로 guide

Training & Sampling

Note: $\nabla \log P(x_t y) = \nabla \log P(x_t) + \nabla \log P(y x_t)$
Training:
- ① Score of unconditional diffusion model
- ② Classifier takes $X_t \rightarrow$ predicts $y$
Sampling:
- ↳ Unconditional score function + Gradient of noisy classifier
- $P(\cdot)$는 share 할 수 있고, $P_\phi(y x_t, t)$ 로 처리가능

Guidance Scale

↳ Hyper parameter ($\gamma$)
$\nabla \log P_\gamma(x_t y) = \nabla \log P(x_t) + \gamma \nabla \log P(y x_t)$
- $\gamma=0$: $y$ 무시.
- $\gamma \rightarrow \infty$: $y$와 강력히 연결. (Quality $\uparrow$ Diversity $\downarrow$)

How?

Train classifier (noise 추가된 거)

Limitation

Noise-aware classifier (Standard classifier X)
Unstable gradients
Computational cost.

Classifier-Free Guidance (CFG)

↳ CFG uses a single diffusion model
Recall: $\nabla \log p(x_t y) = \nabla \log P(x_t) + \nabla \log p(y x_t)$
Implicit classifier:
- $\nabla \log p(y x_t) = \nabla \log P(x_t y) - \nabla \log P(x_t)$
- ↳ Conditioning dropout.
Formula:
- $\nabla \log p_\gamma(x_t y) = \gamma \nabla \log p(x_t y) + (1-\gamma) \nabla \log p(x_t)$
- ↳ Unconditional diffusion conditioning drop out

Conditioning Dropout

$P(x y)$를 학습하자. (Conditioning dropout)
- ↳ $y$를 가끔 없앰 (10~20%.)
- ↳ $\emptyset$로 대체됨
$P(x y)$와 $P(x)$가 모두 기능함.

Guidance Scale Analysis

$\nabla \log p_\gamma(x_t y) = \gamma \nabla \log p(x_t y) + (1-\gamma) \nabla \log p(x_t)$
- $[\gamma=0]$: Unconditional generation
- $[\gamma=1]$: Standard generation
- $[\gamma>1]$: Unconditional score 방향 반대로 이동.
  - Reduce the probs of generating samples that do not use conditioning info.
  - (ex. GLIDE)

Steps in CFG

Prepare: A collection of (image, caption)
Training:
- Train a unified model $\epsilon_\theta(x_t, c)$
- ↳ Random caption dropping (10%)
- ↳ Caption kept (90%)
Sample:
- $X_T$를 뽑고
- (a) Unconditional $\epsilon_\theta(x_t, \emptyset)$: Baseline prediction
- (b) Conditional $\epsilon_\theta(x_t, c)$: Conditioning
- (c) Apply $\gamma$: $\hat{\epsilon}\theta(x_t) = \epsilon{uncond} + \gamma(\epsilon_{cond} - \epsilon_{uncond})$
  - (고양이 쪽 noise) - (noise 없) $\rightarrow$ 고양이 벡터

Modern Architecture

Latent Diffusion Model (LDM)

↳ Latent space에서 하기. 대부분의 bits는 perceptual detail임.
Structure:
- $x \rightarrow E \mapsto z \rightarrow \text{Diff} \longrightarrow Z_T$
- $z \rightarrow D \rightarrow x$
- U-Net uses Condition.
Advantage:
- ① Simple denoising + Faster synthesis (Normal distribution 과 유사)
- ② Expressive design
- ③ Autoencoder 사용가능.

Diffusion Transformer (DiT)

↳ U-Net을 Transformer로 대체.
확산 model forward SDE.
Reverse 에서의 PF-ODE 가능이 증명.

Advanced Models

DDIM

↳ Deterministic sampling.
Predicted:
- $X_{t-1} = \sqrt{\bar{\alpha}{t-1}} \hat{X}_0 + \sqrt{1-\bar{\alpha}{t-1}} \epsilon$
- (Note: Formula in note approximates to deterministic update)
Comparison:
- 원래는 $X_{t-1} = \mu_\theta(x_t, t) + \beta_t \cdot \epsilon$
Benefits:
- ① Higher quality
- ② Consistency

Consistency Models

↳ DDIM 보다 빠르게.
Concept:
- All points on latent space $\rightarrow$ same point in data space.
- $f_\theta(x_t, t) \longrightarrow X_0$ for $\forall t$
- ODE trajectory ($X_T \rightarrow X_0$)
Training:
- ① Directly train $f_\theta$
- ② Consistency distillation
  - Teacher: Multi-step
  - $\downarrow$
  - Student: Consistency

Flow Matching

(바로 ODE를 배우기)
↳ Learning velocity!
$\Rightarrow$ Encourages straighter, simpler path: Faster sampling
How:
- Noise $\leftrightarrow$ Data
- Learn Score
Example: Rectified flow

Examples

Stable Diffusion 3: MMDiT + Rectified flow
Flux: Parallel Transformer
Nano Banana: Autoregressive

Lecture Notes, Deep Learning

This post is licensed under CC BY 4.0 by the author.

Controlling the Generation Process

Classifier Guidance

Basic Idea

Training & Sampling

Guidance Scale

How?

Limitation

Classifier-Free Guidance (CFG)

Conditioning Dropout

Guidance Scale Analysis

Steps in CFG

Modern Architecture

Latent Diffusion Model (LDM)

Diffusion Transformer (DiT)

Advanced Models

DDIM

Consistency Models

Flow Matching

Examples

Trending Tags