Hi, you could try reducing max-length.
For a bert-base model, I found that I needed to keep maxlen x batchsize below about 8192. I think that limit would be even lower for a bert-large model.
Do you need roberta-large, or would roberta-base be sufficient?
(Or even distilroberta-base)