How Diffusion Works

Intermediate

Build intuition for forward corruption, reverse denoising, schedulers, and score-based learning.

Last updated: Feb 25, 2026

Forward Process (Corruption)

The forward process gradually injects noise into clean samples until structure is mostly destroyed. This creates training pairs that teach the model what noise was added at each step.

Start from clean data

Sample x0 from real data (image, text embedding, or latent).

Add small noise repeatedly

Each timestep adds controlled Gaussian or discrete corruption.

Reach near-random state

After many steps, xT approximates a simple prior distribution.

Reverse Process (Generation)

Generation starts from noise and applies learned denoising steps. The model predicts a cleaner state (or the noise residual) and iteratively moves toward a realistic sample.

Noise Prediction Intuition

Predicting noise is often easier than directly predicting the clean sample. Once noise is estimated, you can subtract it and recover structure progressively.

Schedulers: Linear vs Cosine

Schedulers define how much noise is added or removed per step. They strongly affect stability, sample quality, and speed.

Linear schedule

Noise changes at a constant rate per step. Simple, predictable, and easy to implement.

Cosine schedule

Allocates denoising effort non-linearly, often preserving useful signal longer and improving perceptual quality.

Score Matching Intuition

A score function estimates the gradient of log-density, pointing toward more likely data regions. Denoising updates follow this direction, nudging noisy states back to the data manifold.

Noise-to-Signal Visualizer

Adjust noise manually or autoplay forward/reverse to see how structure fades and re-emerges.

Forward & Reverse Diffusion Process

Watch a clean image get progressively corrupted by noise, then recovered — the core idea behind diffusion models.

0%
Clean image (t=0)
Clean imagePure noise

Diffusion models learn to reverse the noise process. During training, the model sees images at every noise level and learns to predict the clean version. During generation, it starts from pure noise and iteratively denoises — producing a new image from nothing.

Key Takeaways

  • Diffusion training learns to invert a known corruption process.
  • Reverse sampling is iterative refinement, not one-step generation.
  • Schedulers control where model capacity is spent across timesteps.
  • Score estimation provides a geometric view of denoising direction.