Error using `max_length` in transformers

Neel-Gupta · February 25, 2021, 2:55pm

I was under the assumption that the model automatically adjusts itself based on the max sequence length, but when I am fine-tuning Roberta-large I am getting an error during inference that the sequence is too large and thus the indices are out of range.

Is there any way to use Trainer and PyTorch to set max_sequence_length as a parameter? I tried finding it in the docs but was unable to do so. This is a surprise because earlier models worked perfectly fine.

theudster · February 25, 2021, 3:41pm

There is a way to do so when you are tokenising your data by setting the max_length parameter;

train_encodings = tokenizer(seq_train, truncation=True, 
                         padding=True, max_length=1024)

Neel-Gupta · February 25, 2021, 5:31pm

The above message gives me this whole traceback when trying to train:-

Downloading: 100%
482/482 [00:34<00:00, 14.1B/s]


Downloading: 100%
1.43G/1.43G [00:24<00:00, 59.0MB/s]


Some weights of the model checkpoint at roberta-large were not used when initializing RobertaForSequenceClassification: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-large and are newly initialized: ['classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.weight', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

---------------------------------------------------------------------------

RuntimeError                              Traceback (most recent call last)

<ipython-input-25-bb6a14612ca7> in <module>()
     46 )
     47 
---> 48 train_results = trainer.train()

17 frames

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
    938                         tr_loss += self.training_step(model, inputs)
    939                 else:
--> 940                     tr_loss += self.training_step(model, inputs)
    941                 self._total_flos += self.floating_point_ops(inputs)
    942 

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in training_step(self, model, inputs)
   1300         if self.use_amp:
   1301             with autocast():
-> 1302                 loss = self.compute_loss(model, inputs)
   1303         else:
   1304             loss = self.compute_loss(model, inputs)

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
   1332         else:
   1333             labels = None
-> 1334         outputs = model(**inputs)
   1335         # Save past state if it exists
   1336         # TODO: this needs to be fixed and made cleaner later.

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1153             output_attentions=output_attentions,
   1154             output_hidden_states=output_hidden_states,
-> 1155             return_dict=return_dict,
   1156         )
   1157         sequence_output = outputs[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/roberta/modeling_roberta.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    815             output_attentions=output_attentions,
    816             output_hidden_states=output_hidden_states,
--> 817             return_dict=return_dict,
    818         )
    819         sequence_output = encoder_outputs[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
    512                     encoder_attention_mask,
    513                     past_key_value,
--> 514                     output_attentions,
    515                 )
    516 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    397             head_mask,
    398             output_attentions=output_attentions,
--> 399             past_key_value=self_attn_past_key_value,
    400         )
    401         attention_output = self_attention_outputs[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    327             encoder_attention_mask,
    328             past_key_value,
--> 329             output_attentions,
    330         )
    331         attention_output = self.output(self_outputs[0], hidden_states)

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/roberta/modeling_roberta.py in forward(self, hidden_states, attention_mask, head_mask, encoder_hidden_states, encoder_attention_mask, past_key_value, output_attentions)
    184         output_attentions=False,
    185     ):
--> 186         mixed_query_layer = self.query(hidden_states)
    187 
    188         # If this is instantiated as a cross-attention module, the keys

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in linear(input, weight, bias)
   1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
-> 1692         output = input.matmul(weight.t())
   1693         if bias is not None:
   1694             output += bias

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`

The Code I used is this:-

from transformers import RobertaForSequenceClassification, Trainer, TrainingArguments

from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='weighted',zero_division=1)  #none gives score for each class
    acc = accuracy_score(labels, preds)
    return {
        'accuracy': acc,
        'f1': f1,
        'precision': precision,
        'recall': recall
    }

training_args = TrainingArguments(
    output_dir='/content/results/',          # output directory
    overwrite_output_dir = True,
    num_train_epochs=16,              # total number of training epochs
    per_device_train_batch_size=2,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='/content/logs',            # directory for storing logs
    logging_steps=10,
    evaluation_strategy='epoch',
    learning_rate=1e-5,
    fp16 = True,
    load_best_model_at_end = True,
    metric_for_best_model = 'eval_accuracy',
    greater_is_better = True,
    seed = 101,
    save_total_limit=5

)

model = RobertaForSequenceClassification.from_pretrained("roberta-large", num_labels=20)

trainer = Trainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset,             # evaluation dataset
    compute_metrics=compute_metrics
)

train_results = trainer.train()

Neel-Gupta · February 26, 2021, 6:44pm

Any help at all? or should I open a GitHub issue?

Topic		Replies	Views
PyTorch version Beginners	7	1724	July 12, 2022
How to specify sequence length when using "feature-extraction" 🤗Transformers	3	1311	April 28, 2021
mT5 maximum sequence length 🤗Transformers	0	429	July 2, 2022
Token indices sequence length is longer than the specified maximum sequence length 🤗Tokenizers	4	23632	February 15, 2023
Claritifcation about the `max_position_embeddings` argument 🤗Transformers	1	519	January 27, 2023

Error using `max_length` in transformers

Related topics