Thank you @nielsr for the answer, it really helps a lot.
I just wanted to clarify that I’m not trying to do fine-tuning but training from scratch.
I usually load the tokenizer as
tokenizer.from_pretrained(...)
and
config = AutoConfig.from_pretrained(...)
, but the actual model is loaded from the config
AutoModelForSequenceClassification.from_config(config)
.
I thought this way would ensure to load the model in order to train from scratch and not just pretrain since I’m not loading the model as
checkpoint = "allenai/longformer-base-4096"
,
AutoModelForSequenceClassification.from_pretrained(checkpoint)
Am I wrong in thinking that this way ensures training from scratch?
My sequences are different lengths but I think in this case I should pad to 1024
given the current checkpoint?
My targets are usually comprised of 1 or 2 tokens after tokenization but I pad them to length=10. Is that wrong?
On another note, what are some other hugging face models with good performance on sequence classification that can be used to train from scratch on my toy synthetic datasets?