Transformers for small datasets?

Following the post with curiosity, great question.