How to Finetune Deberta Model on SQUAD dataset?

bhadresh-savani · January 26, 2021, 11:43am

Hi All,

I am trying to finetune DeBerta Model on SQUAD. If I try existing notebooks like this It uses Trainer and Fast Tokenizer.

DeBerta doesn’t have support for Fast Tokenizer Yet. How can I finetune it on SQUAD?

I am also willing to implement Fast Tokenizer for the Deberta model, Can anyone help me with resources so I can get started with that?

Here is my notebook for training Deberta (I am facing issues)

lewtun · January 26, 2021, 1:01pm

Hi @bhadresh-savani, as far as I can tell the problem seems to lie with your find_sublist_indices function, not on the availability of a fast tokenizer.

One simple thing to try: can you pass a slice of examples to your convert_to_features function, e.g.

convert_to_features(train_dataset[:3])

I’m not sure whether this will solve the problem, but perhaps your find_sublist_indices is expected a list of lists which is what you’ll get from the slice.

I also noticed that your convert_to_features function is quite different to the prepare_train_features in the tutorial - what happens if you try the latter with your tokenizer?

If that doesn’t work, then you might be able to use the old run_qa.py script that doesn’t rely on fast tokenizers: transformers/examples/legacy/question-answering at master · huggingface/transformers · GitHub

Lewis

bhadresh-savani · January 27, 2021, 10:20am

Hi @lewtun

Thanks for your answer

I tried to run old version v3.5.1 by keeping latest version of modeling_deberta.py file with few changes (i needed QuestionAnsweringModelOutput class for SQUAD kind of training)

I was getting below error

    Traceback (most recent call last):
  File "run_squad.py", line 820, in <module>
    main()
  File "run_squad.py", line 734, in main
    model = AutoModelForQuestionAnswering.from_pretrained(
  File "/media/data2/anaconda/envs/transformers-hugginface/lib/python3.8/site-packages/transformers/modeling_auto.py", line 1330, in from_pretrained
    raise ValueError(
ValueError: Unrecognized configuration class <class 'transformers.configuration_deberta.DebertaConfig'> for this kind of AutoModel: AutoModelForQuestionAnswering.
Model type should be one of DistilBertConfig, AlbertConfig, CamembertConfig, BartConfig, LongformerConfig, XLMRobertaConfig, RobertaConfig, SqueezeBertConfig, BertConfig, XLNetConfig, FlaubertConfig, MobileBertConfig, XLMConfig, ElectraConfig, ReformerConfig, FunnelConfig, LxmertConfig.

find_sublist_indices i created by taking ref of this notebook which uses a fast tokenizer, I am trying to do the same without fast tokenizer

Fast tokenizer has method called char_to_token i am trying to implement the same on Python based tokenizer.

Hi @valhalla,

Can you tell me how can i use the same notebook without fast tokenizer since it was created by you?

Topic		Replies	Views
Can someone help guide how to finetune DeBERTa V3 model? Models	1	1208	August 25, 2024
No PreTrainedTokenizerFast for Deberta-V3, no doc_stride 🤗Tokenizers	0	922	July 13, 2022
Fine-Tuning DeBERTa Produces Non-Results 🤗Transformers	3	3072	September 21, 2022
"deberta-v2-xxlarge"-Model not working! Models	2	1526	March 10, 2021
Convert a Python Tokenizer into a TokenizerFast Beginners	0	339	May 20, 2022

How to Finetune Deberta Model on SQUAD dataset?

Related topics