Forward-looking or left-context attention mask (left-to-right) generation with BertGeneration and RobertaForCausalLM

Oh, and it seems that @patrickvonplaten implemented / is involved with these models, maybe you could point me to where this is happening? That would be very helpful :pray: Thanks in advance.