New Paper: Masked Autoencoders Are Scalable Vision Learners

(Meta-comment: I’m actually not sure which forum this would best fit into - seems like it would be useful to have a place where we can discuss new papers.)

This new work by Kaiming He et al seems pretty interesting - they use a very simple setup for masking during pre-training a ViT and it looks like they get very good results across a variety of tasks.

So far, I see an implementation by lucidrains.