Hello,
I want to use the pre-trained BERT model because I do not want to train the entire BERT model to analyze my data. Is there a pre-trained BERT model with sequence length 2048?
or are all pre-trained BERT model only have the sequence length of 512?
I’ve not seen a pre-trained BERT with sequence length 2048.
Training expense goes up as the square of the sequence length, so len 2048 would be very costly to produce. I think you would need a TPU to do fine-tuning on it, even if it exists.
See this page https://github.com/google-research/bert