So does that mean max_position_embedding
is reduced for fine-tuned models (gigaword, wikihow). i.e if the max_position_embedding
for the pre-trained model is 1024 then all fine-tuned models should also have same, right ?
So does that mean max_position_embedding
is reduced for fine-tuned models (gigaword, wikihow). i.e if the max_position_embedding
for the pre-trained model is 1024 then all fine-tuned models should also have same, right ?