"Initializing global attention on CLS token" on Longformer Training

rosenjcb · October 7, 2021, 7:23pm

I have this text classification task that follows the mnli run_glue.py task. The premise is a text that is on average 2k tokens long and the hypothesis is a text that is 200 tokens long. The labels remain the same (0 for entailment, 1 for neutral, 2 for contradiction). I set the train and eval batch size to 1 as anything other than that maxed out my 16 gig vram sagemaker card and I did the training job. It’s been around 2 hours now and I keep seeing the Initializing global attention on CLS token message. Not even sure if the model has even started the epoch yet. For context here are my hyper parameters:

hyperparameters={'model_name_or_path': 'allenai/longformer-base-4096',
                 'task_name': 'mnli',
                 'max_seq_length': 4096,
                 'do_train': True,
                 'do_eval': True,
                 'per_device_train_batch_size': 1,
                 'per_device_eval_batch_size': 1,
                 'output_dir': '/opt/ml/model',
                 'learning_rate': 2e-5,
                 'max_steps': 500,
                 'num_train_epochs': 3}

I have experience training on transformers before but usually with a model like albert. I’ve never done Longformers so I want to know if I should be prepared to wait longer than a day or two.

rosenjcb · October 7, 2021, 7:46pm

Oh thank goodness, the output stated showing training iterations.

Topic		Replies	Views
Longformer seemingly initializing global attention mask for every step Intermediate	0	730	October 25, 2021
Using Longformer with full attention for comparison Beginners	3	1475	November 18, 2022
Summarization pipeline on long text Beginners	6	4508	December 14, 2022
How can I view the output of the answer? Beginners	0	199	June 4, 2021
Character level attention with Longformer for sequence classification Intermediate	0	293	February 25, 2021

"Initializing global attention on CLS token" on Longformer Training

Related topics