Post

12. GAN

12. GAN

1. Introduction

  • Deep generative models

    • $\hookrightarrow$ Likelihood-based

      1. Autoregressive models: Tractable data. $p(x) = \prod_{i=1}^{n} p(x_ix_1, \dots, x_{i-1})$.
      2. VAE: Intractable density, latent space $Z$. $p(x) = \int p(z)p(xz) dz$. Optimize ELBO.
      3. Flow-based.
    • $\hookrightarrow$ Likelihood-free

      1. GAN.

      2. Diffusion (score-based).

  • Problem & Solution

    • $\Rightarrow$ $p(x)$를 추정하지 말고 Sample만 가능하게? (Don’t estimate p(x), just enable sampling?)

    • $\hookrightarrow$ Want to sample!

    • (1) Sample from a simple dist. (e.g., Gaussian).

    • (2) Learn complex transformation. (Simple $\rightarrow$ Complex).


2. GAN (Generative Adversarial Networks)

  • Concept

    • Game-theoretic approach.

    • Discriminator ($D$): Distinguish Real vs. Fake.

      • Real $\rightarrow$ 1.

      • Fake ($G(z)$) $\rightarrow$ 0.

    • Generator ($G$): Fool the discriminator.

      • $z \sim \mathcal{N}$ (Noise).

      • Generate fake data $G(z)$.

      • Wants $D(G(z)) \rightarrow 1$.

  • Diagram Flow

    • $z \text{ (Noise)} \xrightarrow{G} G(z) \text{ (Fake)} \xrightarrow{D} [0, 1]$

    • Real Data $x \xrightarrow{D} [0, 1]$

  • Objectives

    • Discriminator:

      • $D(x) \rightarrow 1$.

      • $D(G(z)) \rightarrow 0$.

    • Generator:

      • $D(G(z)) \rightarrow 1$.
  • Comparison: GAN vs Diffusion

    • GAN: High-quality samples, fast sampling.

      • Cons: Training instability, Mode Collapse.
    • Diffusion: Diverse samples, stable training.

      • Cons: Slow inference (Long sampling time).

3. Formulation of Training Objectives

  • Minimax Game

    • $\min_{G} \max_{D} V(D, G) = E_{x \sim p_{data}}[\log D(x)] + E_{z \sim p_z}[\log(1 - D(G(z)))]$
  • Optimal Discriminator

    • For fixed $G$, the optimal discriminator $D^*$ is:

    • $D^*(x) = \frac{p_{data}(x)}{p_{data}(x) + p_g(x)}$

    • Nash Equilibrium:

      • Occurs when $p_g(x) = p_{data}(x)$.

      • $D^*(x) = \frac{1}{2}$.

      • Value of game becomes $2 \log \frac{1}{2} = -\log 4$.

      • Related to minimizing Jensen-Shannon Divergence (JSD).

  • Gradient Issues (Vanishing Gradient)

    • Update Rule:

      • $\theta_g \leftarrow \theta_g - \eta \frac{\partial J}{\partial \theta_g}$.

      • $\frac{\partial J}{\partial \theta_g} = \frac{\partial J}{\partial D(G(z))} \cdot \frac{\partial D(G(z))}{\partial G(z)} \cdot \frac{\partial G(z)}{\partial \theta_g}$.

    • Problem:

      • If Discriminator is too good (Perfect $D$), then $D(G(z)) \approx 0$ (flat region of sigmoid).

      • $\frac{\partial D}{\partial G} \approx 0$.

      • G learns nothing (Gradient vanishes).

    • Solution (Heuristic / Non-saturating Loss):

      • Instead of minimizing $\log(1 - D(G(z)))$,

      • Maximize $\log D(G(z))$.

      • Provides stronger gradients early in training.


4. Advanced GANs

  • SAGAN (Self-Attention GAN)

    • Motivation:

      • Complex problem $\rightarrow$ Complex model.

      • Convolution only captures local dependencies.

    • Features:

      1. Self-Attention: Captures global dependencies.

      2. Spectral Normalization: Stabilizes discriminator training (Lipschitz constraint).

      3. Conditional Generation: $G(zy)$, $D(x, y)$.

5. Performance Evaluation

Quality - Diversity Trade-off

(1) Quality

  • **Conditional distribution $p(yx)$**.
  • If image $x$ is clear (High Quality) $\rightarrow$ Classifier predicts class $y$ confidently.

  • Low Entropy of $p(yx)$. (Sharp distribution).
    • (Entropy에 반비례 - Inversely proportional to entropy).
  • Bad quality $x$ $\rightarrow$ High entropy.

(2) Diversity

  • **Marginal distribution $p(y) = \int p(yx=G(z)) dz$**.
  • If $G$ generates diverse classes:

    • High Entropy of $p(y)$ (Uniform distribution over classes).

    • $x=G(z)$ 다양함 (Diverse) $\rightarrow$ High entropy.

Evaluation Metrics

(1) Inception Score (IS)

  • $IS(G) = \exp(\mathbb{E}{x \sim G} [ D{KL}( p(yx) p(y) ) ])$
  • Uses a pre-trained Inception Network.

  • Goal:

    • $p(yx)$ should be sharp (Low entropy) $\rightarrow$ High Quality.
    • $p(y)$ should be flat (High entropy) $\rightarrow$ High Diversity.

    • KL Divergence between them should be large.
  • Limitation:

    • 만약 G가 1개의 Class 당 1개만 생성하면 (If G generates only 1 image per class):

      1. Class 종류는 다양하고 (Classes are diverse $\rightarrow$ High $p(y)$ entropy).

      2. 하나의 그림에 대해 무조건 결과가 하나로 나오므로 (Conditional is sharp).

      • Misrepresent: IS is high, but actual diversity (within class) is low. (Mode collapse not fully detected).

(2) Fréchet Inception Distance (FID)

  • $\hookrightarrow$ Measures distance between feature distributions of Real ($x_r$) and Generated ($x_g$) data.

  • $FID(x_r, x_g) = \mu_r - \mu_g ^2 + Tr(\Sigma_r + \Sigma_g - 2(\sqrt{\Sigma_r \Sigma_g}))$
  • Assumes features follow Gaussian distribution.

  • Lower is better (Distance 0 means identical distributions).

  • More robust than IS.
This post is licensed under CC BY 4.0 by the author.