I would like to fine-tune a deberta-v3 pytorch model for Squad-v2 and then other downstream Q&A tasks.
The problem is that the doc_stride option in my arg is causing the following error:
NotImplementedError: return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.More information on available tokenizers at https://github.com/huggingface/transformers/pull/2674
because there is no PreTrainedTokenizerFast for deberta-v3 yet. So…
Can I use deberta-v2 PreTrainedTokenizerFast instead? I would like to think that just because v3 switched to ELECTRA, that change may not be affected by the tokenizers, so maybe I can get away with the v2 tokenizer? Is this just wishful thinking?
Also, just so happens that v3 only has “base”, “large” and “xsmall”, while v2 only has all the other sizes… I would suppose that because of the vocab size difference, token indices and embeddings will be different across different model sizes. That just sounds like a recipe for disaster if I mix and match them…
Any suggestions on how to proceed is much appreciated!