Why do training scripts for fine-tuning BERT-based models on SQuAD (e.g., this one from google or this one from HuggingFace, use set a maximum length of 384 (by default) for input sequences even though the models can handle inputs of length up to 512? (This maximum length refers to the combined length of the question and context, right? Regardless, the questions in the SQuAD dataset typically have length significantly less than 128.)
We use the same default as the Google scripts to reproduce their results. I’m guessing the 384 was a compromise for the regular SQUAD dataset between having most question/contexts be tokenized without any truncation while keeping something small to go fast.