This repository aims at summing up in the same place all the important notions that are covered in Stanford's CME 296 Diffusion & Large Vision Models course. It includes:
- Generation paradigms: diffusion, score matching, convergence of diffusion methods with SDEs, flow matching
- Multimodal guided generation: latent diffusion models with VAEs, Transformer-based representations, contrastive learning, self-supervised learning, guidance
- Image generation architectures: Convolutions, U-Net, attention mechanism, DiT, MM-DiT
- Model training: pre-training, post-training, distillation, evaluation with feature-based metrics and MLLM-as-a-Judge
Afshine Amidi (Ecole Centrale Paris, MIT) and Shervine Amidi (Ecole Centrale Paris, Stanford University)
