Pretrain longT5

I would like to pretrain and then fine tune on longT5 on a custom dataset. Ideally, I would train a tokenizer on my data, then merge tokenizers with longT5 tokenizer and then pretrain from the published longT5 checkpoint. has anyone tried this or is aware of any good resources?