Seq2SeqTrainer, `push_to_hub` returns None

Hi,

When I try to fine-tune the m5-small model with Seq2SeqTrainer I get this error:

   3550                 commit_message = f"Training in progress, epoch {int(self.state.epoch)}"
   3551             _, self.push_in_progress = self.repo.push_to_hub(
-> 3552                 commit_message=commit_message, blocking=False, auto_lfs_prune=True
   3553             )
   3554         finally:

TypeError: cannot unpack non-iterable NoneType object

Here is my code. I’ll start with the model & tokenizer initialization:

MODEL_ID = "google/mt5-small"
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_ID)

And here are the Seq2SeqTrainingArguments and Seq2SeqTrainer:

MODEL_NAME = "mt5-bg-small"

EPOCHS = 15
L_RATE = 2e-4
W_DECAY = 0.01
TRAIN_BATCH_SIZE = 4
EVAL_BATCH_SIZE = 4

training_args = Seq2SeqTrainingArguments(
    output_dir=MODEL_NAME,
    evaluation_strategy="epoch",
    learning_rate=L_RATE,
    per_device_train_batch_size=TRAIN_BATCH_SIZE,
    per_device_eval_batch_size=EVAL_BATCH_SIZE,
    weight_decay=W_DECAY,
    save_total_limit=1,
    num_train_epochs=EPOCHS,
    # predict_with_generate=True,
    fp16=True,
    push_to_hub=True,
    report_to="none",
    
    # Not calculating the additional metrics - only the loss.
    prediction_loss_only=True
)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
)
trainer.train()

tokenizer.push_to_hub(MODEL_NAME)

The error occurs at the 2^{nd} saving step (in my case 1000^{th} step)
I am successfully logged into my account, using a WRITE Access Token. What might be the problem?

Please note - I am using a Kaggle Notebook with a GPU.

Thank you in advance,
Adam

Hi @auhide ,
Were you able to figure out what the problem was? I am having the exact same issue when fine-tuning a GPT-J model.

Hi @zoebat20,

I was not able to figure out what the problem was. I tried different things though.
Changing the base model helped. In my case, instead of using the google/mt5-small model (1.2 GB), I used t5-small or t5-base (242 MB and 898 MB respectively).

I guess the problem was the model size? Not sure why though.