I have this text classification task that follows the mnli
run_glue.py
task. The premise is a text that is on average 2k tokens long and the hypothesis is a text that is 200 tokens long. The labels remain the same (0 for entailment, 1 for neutral, 2 for contradiction). I set the train and eval batch size to 1 as anything other than that maxed out my 16 gig vram sagemaker card and I did the training job. It’s been around 2 hours now and I keep seeing the Initializing global attention on CLS token
message. Not even sure if the model has even started the epoch yet. For context here are my hyper parameters:
hyperparameters={'model_name_or_path': 'allenai/longformer-base-4096',
'task_name': 'mnli',
'max_seq_length': 4096,
'do_train': True,
'do_eval': True,
'per_device_train_batch_size': 1,
'per_device_eval_batch_size': 1,
'output_dir': '/opt/ml/model',
'learning_rate': 2e-5,
'max_steps': 500,
'num_train_epochs': 3}
I have experience training on transformers before but usually with a model like albert. I’ve never done Longformers so I want to know if I should be prepared to wait longer than a day or two.