Pretrained language model that enables non-autoregressive generation

Great! Think itโ€™s very much feasible to implement conditional random fields on top of BERT - cool idea!

Regarding pretraining:
PreTraining BERT in English requires quite some time since the English dataset is so massive. Maybe just fine-tuning it makes sense in a first step ? Or further pre-training an already pre-trained English-BERT on some specific data?

Very much looking forward to this project :slight_smile: