Hello everyone. If I understand correctly the BERT model comes in multiple flavors that could be enabled depending on my needs. I have some text that I want to run multiclass linear regression with embeddings. However, the max_token amount for these texts is 1000. With the base BERT model it is not easily possible to process such data. In the Transformer documentation I read:
int, optional, defaults to 512) – The maximum sequence length that this model might ever be used with. Typically set this to something large just in case (e.g., 512 or 1024 or 2048).
- max_position_embeddings (
If the network has not been pretrained to this number of dimensions this obviously will not work. However, the bert-large-uncased model indicates that it would be possible, because it has 1024 hidden states.
Trying to load the model with my input will yield:
The size of tensor a (803) must match the size of tensor b (512) at non-singleton dimension 1
This is expected if the model is configured to accept 512 tokens at maximum.
I have tried reconfiguring the large model to allow more tokens without any success.
Is it possible to use up to 1024 tokens as an input dimension? If so, how? And are there pretrained models for it?
If this is not the case, what would you suggest as a model for such texts to serve as a baseline architecture instead?
All the best