@lewtun to add on through more testing, I now get the warning:
Some weights of TagPredictionModel were not initialized from the model checkpoint at ./experiments/checkpoint-40 and are newly initialized: [‘.encoder.shared.weight’, ‘.encoder.encoder.embed_tokens.weight’, ‘.encoder.encoder.embed_positions.weight’, ‘.encoder.encoder.layers.0.self_attn.k_proj.weight’…(Cut for length)
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
EDIT: Also, from rerunning evaluation on the validation set after training ends I am almost certain that it is not saving because the eval loss is different than during the training loop