"Initializing global attention on CLS token" on Longformer Training

I have this text classification task that follows the mnli run_glue.py task. The premise is a text that is on average 2k tokens long and the hypothesis is a text that is 200 tokens long. The labels remain the same (0 for entailment, 1 for neutral, 2 for contradiction). I set the train and eval batch size to 1 as anything other than that maxed out my 16 gig vram sagemaker card and I did the training job. It’s been around 2 hours now and I keep seeing the Initializing global attention on CLS token message. Not even sure if the model has even started the epoch yet. For context here are my hyper parameters:

hyperparameters={'model_name_or_path': 'allenai/longformer-base-4096',
                 'task_name': 'mnli',
                 'max_seq_length': 4096,
                 'do_train': True,
                 'do_eval': True,
                 'per_device_train_batch_size': 1,
                 'per_device_eval_batch_size': 1,
                 'output_dir': '/opt/ml/model',
                 'learning_rate': 2e-5,
                 'max_steps': 500,
                 'num_train_epochs': 3}

I have experience training on transformers before but usually with a model like albert. I’ve never done Longformers so I want to know if I should be prepared to wait longer than a day or two.

Oh thank goodness, the output stated showing training iterations. :joy: