I think that could indeed be the issue. Since the tokenizer has a different vocabulary size this is likely incompatible with the config you are loading which contains the vocab size of the original model. You can fix it with:
config = AutoConfig.from_pretrained(model_checkpoint, vocab_size=len(tokenizer))
I hope this helps!
PS: sometimes debugging these CUDA errors can be unreadable and it can help to execute the code for debugging purposes on the CPU instead (training_args.device=‘cpu’ should do the trick).