Unit of max_answer_length in run_qa.py script?

PaschiSt · February 3, 2022, 7:39pm

Hi,
I’m currently trying to finetune my BERT model on a question answering task and use the run_qa.py script. I’m just curious about what the unit of max_answer_length argument is. Is it the length in characters or in tokens after tokenization?

And is there any suggestion to which value is best to use. Does it make sense to set the value to the length of the longest answer in the dataset (which imo might be bad when there’s one really long answer and the rest is rather short), or just use like an average answer length?

Thanks in advance!

sgugger · February 4, 2022, 2:56pm

It’s in tokens. We usually take the same default as the original Google script, which works quite well.

Topic		Replies	Views
SQuAD/BERT: Why max_length=384 by default and not 512? Models	1	2475	November 15, 2021
Why does increasing sequence length reduce Q&A performance on my test set? Intermediate	0	349	August 30, 2021
The input length for bert 🤗Transformers	0	188	March 24, 2023
Model max length not set. Default value 🤗Transformers	1	633	October 6, 2024
Max length transformers problem 🤗Transformers	0	127	March 4, 2023

Unit of max_answer_length in run_qa.py script?

Related topics