That should work! We are also working on datasets streaming for very large datasets, see PR here: https://github.com/huggingface/datasets/pull/2375 and RoBERTa lange can fit up to a batch_size of 512 or 1024 on a TPUv3-8 for a sequence length of 128 (most of the time one actually starts with just 128 sequence length).
So this is definitely a doable project!