Masked language model for BART (Not BERT)

ahadda5 · July 3, 2022, 11:33am

Hi, I’m trying to train a BART model using masking(MLM).
The model type is BartForConditionalGeneration. The task I have is text generation(key phrases) of an input text.

Before trying it on a custom dataset, I wanted to try it on the given official huggingface example here, which is in fact similar to huggingface github example

To save space and not past the entire code as is, I changed the model to one suited for my task i found on huggingface. [Everything else is the same, plus enabling this variable to help in CUDA stack debuging os.environ[‘CUDA_LAUNCH_BLOCKING’]=“1” ]

model_checkpoint = "distilbert-base-uncased" model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)
to

model_checkpoint = "memray/bart-wikikp"
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)

Based on provided documentation, this unsupervised approach is viable if one wants to fine-tune the model for a specific domain. Therefore, before fine-tuning, Masked language modelling helps acquaint the model with the new corpus first. Also, in the documentation there is no mention of specific-tasks, e.g. only applicable for QA(question-answering) or text classification.

Please note that in the first link above the imdb dataset is used.

The errors i get are CUDA related, (using GPU) when training

trainer.train()

Error
“***** Running training *****
Num examples = 10000
Num Epochs = 3
Instantaneous batch size per device = 32
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 1
Total optimization steps = 939
0%| | 0/939 [00:00<?, ?it/s]/opt/conda/conda-bld/pytorch_1646755953518/work/aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [42,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1646755953518/work/aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [42,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed.
/opt/conda/conda-bld/pytorch_1646755953518/work/aten/src/ATen/native/cuda/Indexing.cu:703: indexSelectLargeIndex: block: [42,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed.

Traceback (most recent call last):
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/IPython/core/interactiveshell.py”, line 3457, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File “”, line 1, in
trainer.train()
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/trainer.py”, line 1413, in train
ignore_keys_for_eval=ignore_keys_for_eval,
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/trainer.py”, line 1651, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/trainer.py”, line 2345, in training_step
loss = self.compute_loss(model, inputs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/trainer.py”, line 2377, in compute_loss
outputs = model(**inputs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/models/bart/modeling_bart.py”, line 1368, in forward
return_dict=return_dict,
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/models/bart/modeling_bart.py”, line 1229, in forward
return_dict=return_dict,
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/models/bart/modeling_bart.py”, line 850, in forward
output_attentions=output_attentions,
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/models/bart/modeling_bart.py”, line 327, in forward
output_attentions=output_attentions,
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/transformers/models/bart/modeling_bart.py”, line 191, in forward
query_states = self.q_proj(hidden_states) * self.scaling
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 1110, in _call_impl
return forward_call(*input, **kwargs)
File “/home/haddad/.conda/envs/hugg/lib/python3.7/site-packages/torch/nn/modules/linear.py”, line 103, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
/”

ahadda5 · July 3, 2022, 11:35am

If using accelerator module, i get

RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.)

Everything I read says that there is a mismatch between the model head, input tensor or other tensors causing this to happen. It seems BART needs something extra to be adapted to MLM?!

[UPDATE]
So commenting the collator argument gets the trainer.train() to work, otherwise CUDA faces lots of issues whether with the accelerator or not.

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=downsampled_dataset["train"],
    eval_dataset=downsampled_dataset["test"],
    # data_collator=data_collator,
)

ahadda5 · July 5, 2022, 7:44am

One thing i suspect, is that the model input embeddings is the vocab size (50265) x 1024!
Whereas the imdb is chunked into 128.
How can i change the model to adopt that input dimension? 1024 → 128

cog · July 5, 2022, 10:21am

hi @ahadda5 ,

is there config setting different?

setting config’s max length size or hidden layer dimension.

HF bart config docs

Also, if you want to build BART for Masked LM, add some last layers to predict hidden layer’s output to your output dimension.

for example, in BertForMaskedLM, class BertLMPredictionHead(nn.Module) fit dimension of hidden to output(vocab)

self.decoder = nn.Linear(config.hidden_size, config.vocab_size, bias=False)

you can get hint from BertForMaskedLM 's layer structure to build BART Masked LM .

hope to help.

ahadda5 · July 5, 2022, 1:24pm

thanks for your reply. I will investigate further.
But debugging the BART beast,
encoder layer dims are consistent with decoder layers.
Labels are also consistent with input ids for Masking.

I’ll take a look at BERT

ahadda5 · July 5, 2022, 5:18pm

Okay so the issue was that the used model has a vocab size, 50264! however the tokenizer has a size of 50265!

So had to resize_token_embedding the model to that of tokenizer!
thanks @sgugger , @cog for the guidance. Happy coding !

Topic		Replies	Views
Is there any example of training BART for text-to-text generation? Beginners	0	533	March 2, 2023
SpanBERT, ELECTRA, MARGE from scratch? Beginners	5	1379	July 22, 2023
BartDecoder outputs perfect predictions even when untrained Beginners	0	149	October 27, 2023
Train Bart for Conditional Generation (e.g. Summarization) Models	14	17160	November 22, 2023
Data collator for training bart from scratch Beginners	1	2567	December 6, 2020

Masked language model for BART (Not BERT)

Related topics