Facebook BART Fine-tuning - Transformers CUDA error: CUBLAS_STATUS_NOT_INITIALIZE

I’m trying to finetune the Facebook BART model, I’m following this article in order to classify text using my own dataset.

And I’m using the Trainer object in order to train:

training_args = TrainingArguments(
    output_dir=model_directory,      # output directory
    num_train_epochs=1,              # total number of training epochs - 3
    per_device_train_batch_size=4,  # batch size per device during training - 16
    per_device_eval_batch_size=16,   # batch size for evaluation - 64
    warmup_steps=50,                # number of warmup steps for learning rate scheduler - 500
    weight_decay=0.01,               # strength of weight decay
    logging_dir=model_logs,          # directory for storing logs

model = BartForSequenceClassification.from_pretrained("facebook/bart-base") # bart-large-mnli

trainer = Trainer(
    model=model,                          # the instantiated 🤗 Transformers model to be trained
    args=training_args,                   # training arguments, defined above
    compute_metrics=new_compute_metrics,  # a function to compute the metrics
    train_dataset=train_dataset,          # training dataset
    eval_dataset=val_dataset              # evaluation dataset

This is the tokenizer I used:

from transformers import BartTokenizerFast
tokenizer = BartTokenizerFast.from_pretrained('facebook/bart-base')

But when I use trainer.train() I get the following:

Printing the following:

***** Running training *****
  Num examples = 172
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 11

Followed by this error:

RuntimeError: Caught RuntimeError in replica 1 on device 1.
Original Traceback (most recent call last):
  File "/databricks/python/lib/python3.9/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
    output = module(*input, **kwargs)
  File "/databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/databricks/python/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py", line 1496, in forward
    outputs = self.model(
  File "/databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/databricks/python/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py", line 1222, in forward
    encoder_outputs = self.encoder(
  File "/databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/databricks/python/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py", line 846, in forward
    layer_outputs = encoder_layer(
  File "/databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/databricks/python/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py", line 323, in forward
    hidden_states, attn_weights, _ = self.self_attn(
  File "/databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/databricks/python/lib/python3.9/site-packages/transformers/models/bart/modeling_bart.py", line 191, in forward
    query_states = self.q_proj(hidden_states) * self.scaling
  File "/databricks/python/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/databricks/python/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`

I’ve searched this site and GitHub and Stackoverflow but still didn’t find anything that helped me fix this for me (I tried adding more memory, lowering batches and warmup, restarting, specifying CPU or GPU, and more, but none worked for me)

I’m using Databricks for that, with the cluster: Standard_NC24s_v3 with 4 GPUs, 2 to 6 workers

If you require any other information, comment and I’ll add it up asap

hi @LidorPrototype ,

First, check your own dataset will match with facebook/bart-base 's vocab size.
Second, check word that in your own dataset is in model’s vocabulary.
Third, check model, dataloader, tokenizer config is correct. There many mistake are config’s wrong path or config setting is wrong.

As you see this article, if your own dataset contain some data that dose not support at model’s support vocab, you can face that error.

If you want to check detail, please debug some easy code like docs example code.
Follow line with your own dataset, you can see trackback message more detail.
(Some data not contain vocab, mismatch config, etc…)

bart docs page

from transformers import AutoTokenizer, BartModel
import torch

tokenizer = AutoTokenizer.from_pretrained("facebook/bart-base")
model = BartModel.from_pretrained("facebook/bart-base")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

How can I find the vocab size of bart plus how do I calculate it for my own dataset?

Im not using a data loader, and the tokenizer I copied from the article I specified but with the model I require, Im not adding some special configs, should I add them?

My dataset is holding private information but here is an image of how it’s built:


I was able to get it working only by passing the following:

  • num_labels with the actual number of labels to the from_pretrained
  • Making sure that labels order start from 0 and not from 1
  • passing ignore_mismatched_sizes=True to the from_pretrained
  • Allowing the flag gradient_checkpointing=True in the TrainingArguments