Skip to content

afshinea/stanford-cme-296-diffusion-large-vision-models

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Diffusion & Large Vision Models cheatsheet for Stanford's CME 296

Goal

This repository aims at summing up in the same place all the important notions that are covered in Stanford's CME 296 Diffusion & Large Vision Models course. It includes:

  • Generation paradigms: diffusion, score matching, convergence of diffusion methods with SDEs, flow matching
  • Multimodal guided generation: latent diffusion models with VAEs, Transformer-based representations, contrastive learning, self-supervised learning, guidance
  • Image generation architectures: Convolutions, U-Net, attention mechanism, DiT, MM-DiT
  • Model training: pre-training, post-training, distillation, evaluation with feature-based metrics and MLLM-as-a-Judge

Content

VIP Cheatsheet

Illustration

Class website

cme296.stanford.edu

Authors

Afshine Amidi (Ecole Centrale Paris, MIT) and Shervine Amidi (Ecole Centrale Paris, Stanford University)

About

VIP cheatsheet for Stanford's CME 296 Diffusion and Large Vision Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors