How can I evaluate my fine-tuned model on Squad?

theudster · May 4, 2021, 10:32am

Hello,

I have loaded the already finetune model for squad 'twmkn9/bert-base-uncased-squad2'
I would like to now evaluate it on the SQuAD2 dataset, how would I do that?
This is my code currently;

from transformers import AutoTokenizer, AutoModelForQuestionAnswering, AutoConfig

model_name = 'twmkn9/bert-base-uncased-squad2'

config = AutoConfig.from_pretrained(model_name, num_hidden_layers=10)
tokenizer = AutoTokenizer.from_pretrained(model_name)  
model = AutoModelForQuestionAnswering.from_config(config)

Now I am just unsure what to do next?

UPDATE

`i am trying to follow the instructions from here. Yet I am unsure how to use my own model. This is what I have:

# Grab the run_squad.py script

!curl -L -O https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/question-answering/run_qa.py

!curl -L -O https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/question-answering/trainer_qa.py

!curl -L -O https://raw.githubusercontent.com/huggingface/transformers/master/examples/pytorch/question-answering/utils_qa.py 


!python run_qa.py  \
    --model_type bert   \
    --model_name_or_path model  \
    --output_dir models/distilbert/twmkn9_distilbert-base-uncased-squad2 \
    --data_dir data/squad   \
    --predict_file dev-v2.0.json   \
    --do_eval   \
    --version_2_with_negative \
    --do_lower_case  \
    --per_gpu_eval_batch_size 12   \
    --max_seq_length 384   \
    --doc_stride 128

But I am getting an error

2021-05-04 11:14:04.086537: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
Traceback (most recent call last):
  File "run_qa.py", line 613, in <module>
    main()
  File "run_qa.py", line 208, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/usr/local/lib/python3.7/dist-packages/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 20, in __init__
  File "run_qa.py", line 184, in __post_init__
    raise ValueError("Need either a dataset name or a training/validation file/test_file.")
ValueError: Need either a dataset name or a training/validation file/test_file.

I imagine because --model_name_or_path model \ is no good, but then how do I call my own configured model?

Thank you

sgugger · May 4, 2021, 12:20pm

The error comes before your model (you will get one for the model after ), first you have to replace --predict_file by --test_file in your command. Then if your model is in models/distilbert/twmkn9_distilbert-base-uncased-squad2, this is what you should pass (the model_type argument is useless, it will be inferred from the model files).

theudster · May 4, 2021, 1:05pm

Thank you!
will calling models/distilbert/twmkn9_distilbert-base-uncased-squad2 call my model which I configured? I haven’t saved it anywhere, just called it in my notebook (like my code above)

sgugger · May 4, 2021, 1:08pm

Then you need to pass twmkn9/bert-base-uncased-squad2

theudster · May 4, 2021, 1:11pm

Cool! just to try and gain an understanding, how does this work? As in how does it know through this command to call my configured model (which had layers cut off it)?

theudster · May 4, 2021, 1:25pm

UPDATE:

This seemed to call the original model, nit my configured one.

This is what im getting for the initialisation

[INFO|configuration_utils.py:553] 2021-05-04 13:19:23,488 >> Model config BertConfig {
  "architectures": [
    "BertForQuestionAnswering"
  ],
  "attention_probs_dropout_prob": 0.1,
  "gradient_checkpointing": false,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.6.0.dev0",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

But the “num_hidden_layers” should be 10 according to my configuration

sgugger · May 4, 2021, 1:32pm

I’m really confused. How do you want to call your modified model if you did not save it anywhere?

theudster · May 4, 2021, 1:35pm

Fair enough, that’s what I was unsure about.
Would saving it in the Colab be enough? because I wouldn’t want to upload these to HuggingFace…

Alternatively I was thinking maybe there was a way to change the model’s configuration with the script

theudster · May 4, 2021, 3:00pm

@sgugger thank you so much, got it all working…now I have an issue for training time though. I froze all layers but the head and am trying to train using the script. Its working but giving me an estimated time of 8 hours! Is that normal?

(to do this, I configured my model to have 10 layers, added

for param in model.base_model.parameters():
    param.requires_grad = False

to freeze the layers and then saved the model. I loaded this model in the script)

sgugger · May 4, 2021, 3:35pm

No you need to add this freezing in the script. Reloading the model will remove the freezing.

theudster · May 4, 2021, 3:40pm

wow! what can’t this script do?

Sorry to bother you and thank you for being so helpful, but how would do that?
I looked through trainer_qa.py, utils_qa.py and run_qa.py and couldn’t find a requires_grad parameter to change?

sgugger · May 4, 2021, 4:43pm

No, you need to add the lines of code inside the example script. It’s not baked inside.

theudster · May 5, 2021, 1:08pm

Hi! So I added those lines in the run_qa.py file just before the trainer is called.
I trained (which took 1.5 hours) but then got terrible f1 results (5.9)
I hypothesised that using 10 instead of 12 layers would end up with a worse output but not this much worse?
Is it possible that even the qa_head was frozen and nothing was training at all?
Was wondering if you have any ideas…

sgugger · May 5, 2021, 1:25pm

You are not using a pretrained model anymore by reducing the numbers of layers, so why would you freeze parameters? This only makes sense when fine-tuning a model, not training from scratch.

theudster · May 5, 2021, 1:29pm

Hi, basically I would like to see what BERT learns across the different layers (how much it already has figured out by the nth layer).
So I was thinking to take a fine-tuned model for QA, cut lets say the last 2 layers. I was assuming that the rest of the layers keep their old weights.
Now it has 10 layers + q_a head. Now I freeze all the layers and fine-tune it so that the final layer ‘connects’ to the qa_head so that I can probe how much that layer knows.
Does that make sense for such an experiment?

Topic		Replies	Views
Evaluate question answering with squad dataset Beginners	2	1308	October 10, 2021
Evaluating Finetuned BERT Model for Sequence Classification Beginners	10	8498	October 25, 2022
Evaluating QA model on single SQuAD file Beginners	1	731	June 7, 2021
How to evaluate models Beginners	0	2849	June 16, 2021
Run_qa.py related query Beginners	2	445	January 6, 2023

How can I evaluate my fine-tuned model on Squad?

Related topics