CUDA RunTime Error during ASR training

Hello

I have tried to look for this problem but I couldn’t find any solution anywhere yet.
PC specs:

Acer Predator Helios 300
* GPU: GTX 1660 Ti
* CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
* RAM: 16GB

I am trying to train Georgian model with following tutorial on hugging face

Example is in “transformers/examples/pytorch/speech-recognition” directory

I have installed all necessary libraries in python virtual environment:

* transformers       4.22.0.dev0
* torch                   1.12.1
* torchaudio          0.12.1
* torchvision          0.13.1

and example specific:

* datasets   2.4.0
* torchaudio
* librosa
* jiwer

when I run example with following command:

python3 run_speech_recognition_ctc.py \                                                                                            
        --dataset_name="common_voice" \
        --model_name_or_path="facebook/wav2vec2-large-xlsr-53" \
        --dataset_config_name="ka" \
        --output_dir="./wav2vec2-common_voice-ka-demo" \
        --overwrite_output_dir \
        --num_train_epochs="15" \
        --per_device_train_batch_size="16" \
        --gradient_accumulation_steps="2" \
        --learning_rate="3e-4" \
        --warmup_steps="500" \
        --evaluation_strategy="steps" \
        --text_column_name="sentence" \
        --length_column_name="input_length" \
        --save_steps="400" \
        --eval_steps="100" \
        --layerdrop="0.0" \
        --save_total_limit="3" \
        --freeze_feature_encoder \
        --gradient_checkpointing \
        --chars_to_ignore , '?' . ! - \; \: \" “ % ‘ ” � \
        --fp16 \
        --group_by_length \
        --push_to_hub=false \
        --do_train --do_eval

I get following error:

The following columns in the training set don't have a corresponding argument in `Wav2Vec2ForCTC.forward` and have been ignored: input_length. If input_length are not expected by `Wav2Vec2ForCTC.forward`,  you can safely ignore this message.
/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  warnings.warn(
***** Running training *****
  Num examples = 1585
  Num Epochs = 15
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 32
  Gradient Accumulation steps = 2
  Total optimization steps = 750
  0%|                                                                                                                                                                                                                | 0/750 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/pavle/Dev/ai/Georgian/transformers/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py", line 770, in <module>
    main()
  File "/home/pavle/Dev/ai/Georgian/transformers/examples/pytorch/speech-recognition/run_speech_recognition_ctc.py", line 718, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1505, in train
    return inner_training_loop(
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/transformers/trainer.py", line 1747, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/transformers/trainer.py", line 2477, in training_step
    loss = self.compute_loss(model, inputs)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/transformers/trainer.py", line 2509, in compute_loss
    outputs = model(**inputs)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1660, in forward
    outputs = self.wav2vec2(
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 1301, in forward
    hidden_states, extract_features = self.feature_projection(extract_features)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/transformers/models/wav2vec2/modeling_wav2vec2.py", line 482, in forward
    hidden_states = self.projection(norm_hidden_states)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pavle/Dev/ai/Georgian/.env/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)`

I tried different cuda versions from pip but I get same error all the time
can anyone point me to right direction?

Thanks in advance