modelSeq2Seq = AutoModelForSeq2SeqLM.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
so this modelseq2seq model has an extra Fully connected layer 756 to NUM tokens (lm_head) but the plain model does not output that why?
Even in the pre-training phase, there must be a layer to convert the embeds to logits, right?