Taming Teacher Forcing for Masked Autoregressive Video Generation
February 2026
20 min read
Video Generation, Masked Modeling, Transformer, Teacher Forcing
Use arrow keys or click to navigate slides. Press 'F' or Fullscreen icon for best experience.
What You'll Learn
- •Video Generation Categories: Masked vs Fully Autoregressive
- •The 'Training-Inference Mismatch' problem with Teacher Forcing
- •MAGI's solution: Complete Teacher Forcing (CTF)
- •Hybrid Transformer Backbone (Spatial + Temporal attention)
- •Stabilization tricks: Dynamic Interval Training & Noise Injection
Key Concepts Covered
Conditioning on ground-truth past frames during training, causing mismatch at inference.
Conditioning on unmasked observed frames to mimic inference conditions.
Randomly sampling frame intervals to handle varying motion speeds.
Adding noise to observation frames to improve robustness against prediction errors.
Resources
Slide Overview
- Video Generation Landscape
- The Core Issue: Autoregression & Teacher Forcing
- MAGI & CTF Approach
- Architecture & Stabilization Tricks
