Longformer seemingly initializing global attention mask for every step

green42 · October 25, 2021, 5:10pm

Included an image of the problem. I’m using the LongformerForSequenceClassification model and Trainer to train the model, can’t figure out what is going wrong. Here’s the model initialization and the training argument, in case that’s relevant

model = LongformerForSequenceClassification.from_pretrained('allenai/longformer-base-4096', num_labels = 25)

training_args = TrainingArguments(
    output_dir='./drive/MyDrive/checkpoints',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=4,  # batch size per device during training
    per_device_eval_batch_size=8,   # batch size for evaluation
    warmup_steps=200,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./drive/MyDrive/logs',            # directory for storing logs
    logging_steps=500,
    save_strategy="epoch",
    fp16 = True
)

trainer = Trainer(
    model=model,                         # the instantiated 🤗 Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=test_dataset,             # evaluation dataset
    compute_metrics=compute_metrics,
)

trainer.train()

Topic		Replies	Views
"Initializing global attention on CLS token" on Longformer Training Beginners	1	1129	October 7, 2021
Huggingface sequence classification unfreezing layers 🤗Transformers	2	1312	March 24, 2022
Getting predictions 🤗Transformers	1	286	October 15, 2020
Huggingface longformer memory issues 🤗Transformers	0	539	March 31, 2022
Fine-tuned longformer classifies all test samples as False Beginners	0	351	May 19, 2022

Longformer seemingly initializing global attention mask for every step

Related topics