Skip to content

Latest commit

 

History

History
66 lines (42 loc) · 1.64 KB

README.md

File metadata and controls

66 lines (42 loc) · 1.64 KB

Aggressor: Ultra-minimal autoregressive diffusion model for image and speech generation

CIFAR

MNIST

AUDIO

cifar

mnist

wav_aggressor.mp4

A simplest possible implementation of Autoregressive Image Generation without Vector Quantization.

Key Features

  • Simple Architecture: A tiny transformer for autoregression and an MLP for diffusion.
  • Minimal Dependencies: Built from scratch using only basic MLX operations.
  • Single-File Implementation: Entire model in one Python file aggressor.py.

Components

  • Aggressor: Main model class combining transformer and diffusion.
  • Transformer: Multi-layer transformer with attention and MLP blocks.
  • Denoiser: MLP-based diffusion process with time embedding.
  • Scheduler: Handles forward and backward processes for diffusion.

Usage

python aggressor.py

(Training on 60000 images x 20 epochs takes approximately 7~8 minutes on 8GB M2 MacBook.)

Acknowledgements

Thanks to lucidrains' fantastic code that inspired this project. The official implementation is available here.