Hi
I fine-tuned MT5ForSequenceClassification on a dataset as a regression. However, after fine-tuning, the saved model’s config says the architecture is MT5ForConditionalGeneration. Is this supposed to be like that, or is something wrong? Why did it change?
Also, I don’t know how to do inference with my saved model. The paper I am trying to replicate says “We adapt mT5 into a regression-metric by applying a RMSE loss between the logits of a special classification token and the label which is either 0 or 1. During inference, we then force-decode the classification token and extract its probability.”
But I’m not sure how to do the last sentence.
My eval loss is 0.2576, which isn’t bad, so I should be able to get reasonable results with inference but I haven’t been able to make it work.
Any help would be appreciated. I am using this training script with these args:
--model_name_or_path ${MODEL_SCRATCH} \
--train_file ${DATA_SCRATCH}/train_stata.csv \
--validation_file ${DATA_SCRATCH}/dev_stata.csv \
--test_file ${DATA_SCRATCH}/test_stata.csv \
--do_regression True \
--metric_name rmse \
--text_column_name "linearized_input,output" \
--label_column_name attributable \
--do_train \
--do_eval \
--do_predict \
--max_seq_length 2048 \
--per_device_train_batch_size 8 \
--learning_rate 1e-4 \
--lr_scheduler_type constant \
--ignore_mismatched_sizes True \
--num_train_epochs 1 \
--output_dir ${OUTPUT_DIR} \
--overwrite_output_dir True \
--gradient_checkpointing True \
--gradient_accumulation_steps 1 \
--eval_accumulation_steps 1 \
--text_column_delimiter "[output]" \
--save_total_limit 1 \
--load_best_model_at_end True \
And here is some of the output logs:
[INFO|modeling_utils.py:3373] 2024-01-07 22:03:10,696 >> loading weights file /disk/scratch/s2029717/mt5-large-seq/model.safetensors
[INFO|modeling_utils.py:4227] 2024-01-07 22:03:17,330 >> All model checkpoint weights were used when initializing MT5ForSequenceClassification.
[WARNING|modeling_utils.py:4248] 2024-01-07 22:03:17,330 >> Some weights of MT5ForSequenceClassification were not initialized from the model checkpoint at /disk/scratch/s2029717/mt5-large-seq and are newly initialized because the shapes did not match:
- classification_head.out_proj.bias: found shape torch.Size([2]) in the checkpoint and torch.Size([1]) in the model instantiated
- classification_head.out_proj.weight: found shape torch.Size([2, 1024]) in the checkpoint and torch.Size([1, 1024]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Why am I getting this warning when I am loading a ForSequenceClassification model?
You can see the model files here.