Hi there, had issues with running the above script in colab at the following link: transformers/run_semantic_segmentation.py at main · huggingface/transformers · GitHub
Training args I used are below:
!python /content/drive/MyDrive/run_semantic_segmentation.py
–model_name_or_path nvidia/mit-b5
–dataset_name nickmuchi/rugd-dataset-all
–output_dir /content/drive/MyDrive/segformer-finetuned-rugd-out
–remove_unused_columns False
–do_train
–do_eval
–evaluation_strategy steps
–push_to_hub
–push_to_hub_model_id segformer-finetuned-rugd
–max_steps 10000
–learning_rate 0.00006
–lr_scheduler_type polynomial
–per_device_train_batch_size 2
–per_device_eval_batch_size 2
–logging_strategy steps
–logging_steps 100
–evaluation_strategy epoch
–save_strategy epoch
–save_total_limit 2
–load_best_model_at_end True
–seed 1337
–max_train_samples 3000
–max_eval_samples 500
Error:
Traceback (most recent call last):
File “/content/drive/MyDrive/run_semantic_segmentation.py”, line 508, in
main()
File “/content/drive/MyDrive/run_semantic_segmentation.py”, line 483, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File “/usr/local/lib/python3.7/dist-packages/transformers/trainer.py”, line 1324, in train
ignore_keys_for_eval=ignore_keys_for_eval,
File “/usr/local/lib/python3.7/dist-packages/transformers/trainer.py”, line 1559, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File “/usr/local/lib/python3.7/dist-packages/transformers/trainer.py”, line 2206, in training_step
loss.backward()
File “/usr/local/lib/python3.7/dist-packages/torch/_tensor.py”, line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File “/usr/local/lib/python3.7/dist-packages/torch/autograd/init.py”, line 175, in backward
allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)
0% 0/10000 [00:00<?, ?it/s]
Tried googling but did not get much, thanks.