No PreTrainedTokenizerFast for Deberta-V3, no doc_stride

Hi All

I would like to fine-tune a deberta-v3 pytorch model for Squad-v2 and then other downstream Q&A tasks.

The problem is that the doc_stride option in my arg is causing the following error:

NotImplementedError: return_offset_mapping is not available when using Python tokenizers.To use this feature, change your tokenizer to one deriving from transformers.PreTrainedTokenizerFast.More information on available tokenizers at https://github.com/huggingface/transformers/pull/2674

because there is no PreTrainedTokenizerFast for deberta-v3 yet. Soā€¦

Can I use deberta-v2 PreTrainedTokenizerFast instead? I would like to think that just because v3 switched to ELECTRA, that change may not be affected by the tokenizers, so maybe I can get away with the v2 tokenizer? Is this just wishful thinking?

Also, just so happens that v3 only has ā€œbaseā€, ā€œlargeā€ and ā€œxsmallā€, while v2 only has all the other sizesā€¦ I would suppose that because of the vocab size difference, token indices and embeddings will be different across different model sizes. That just sounds like a recipe for disaster if I mix and match themā€¦

Any suggestions on how to proceed is much appreciated!

SteX