Skip to content

Ultra-minimal autoregressive diffusion model for image generation

License

Notifications You must be signed in to change notification settings

JosefAlbers/Aggressor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Aggressor: Ultra-minimal autoregressive diffusion model for image and speech generation

CIFAR

MNIST

AUDIO

cifar

mnist

wav_aggressor.mp4

A simplest possible implementation of Autoregressive Image Generation without Vector Quantization.

Key Features

  • Simple Architecture: A tiny transformer for autoregression and an MLP for diffusion.
  • Minimal Dependencies: Built from scratch using only basic MLX operations.
  • Single-File Implementation: Entire model in one Python file aggressor.py.

Components

  • Aggressor: Main model class combining transformer and diffusion.
  • Transformer: Multi-layer transformer with attention and MLP blocks.
  • Denoiser: MLP-based diffusion process with time embedding.
  • Scheduler: Handles forward and backward processes for diffusion.

Usage

python aggressor.py

(Training on 60000 images x 20 epochs takes approximately 7~8 minutes on 8GB M2 MacBook.)

Acknowledgements

Thanks to lucidrains' fantastic code that inspired this project. The official implementation is available here.