[Open-to-the-community] Whisper fine-tuning event

Hey @pierreguillou! Sorry about the delay, just merged a PR that adds them: [WFTE] Add non-zero ref filtering step by sanchit-gandhi · Pull Request #87 · huggingface/community-events · GitHub

1 Like

@steja, @sanchit-gandhi, hello. I am having the same problem as @steja:
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

Is there a resolution to this?

For what it is worth, here is the output from transformers-cli and nvidia-smi:

$ transformers-cli env

  • transformers version: 4.26.0.dev0
  • Platform: Linux-4.18.0-372.26.1.el8_6.x86_64-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • Huggingface_hub version: 0.11.1
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes, A100
  • Using distributed or parallel set-up in script?: No

$ nvidia-smi

Wed Dec 21 21:23:10 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10 On | 00000000:25:00.0 Off | 0 |
| 0% 29C P8 21W / 150W | 0MiB / 23028MiB | 0% E. Process |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A100 80G… On | 00000000:81:00.0 Off | 0 |
| N/A 26C P0 42W / 300W | 0MiB / 81920MiB | 0% E. Process |
| | | Disabled |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Any update will be appreciated. Thanks a lot.

Hey @vkg! Sorry for the late reply here! Have you implemented the additional filtering step outlined in
[Open-to-the-community] Whisper fine-tuning event - #21 by sanchit-gandhi? If so and the error still persists, could you try the workarounds listed in this PyTorch thread?
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` - #13 by NicoHambauer - PyTorch Forums

Let me know if this continues to be an issue - more than happy to dig further on this!

@sanchit-gandhi, thank you for getting back to me.

Unfortunately, the first link in your response does not seem to be related to the problem I am facing, and the second link does not quite help.

So, this still remains an issue, I am afraid. Thanks again for your time.

The same code runs fine in the Google colab environment, just does not seem to work for my environment, the details of which are in my initial post.

1 Like

This is very cool! I’ve been working hard using Whisper on Hebrew, but I missed this specific event. Will there be a follow up?

@sanchit-gandhi Hi, seconding @vkg’s issue. The two resources you linked don’t address the core problem here: the whisper fine tuning demo you authored works fine in Colab, but returns the following error when run in a local notebook: RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
I have verified this isn’t an issue with GPU memory, and shouldn’t be a dimension mismatch as the dataset processing is identical to the demo. Could you please take a second look at this issue? Thank you.

Hey @avoy and @vkg!

I’ve re-run everything on a local GPU and it all works on my side: sanchit-gandhi/whisper-small-hi-rerun · Hugging Face

Based on the RuntimeError message alone I can’t pinpoint what it could be, but I don’t think it’s an error with the script (since it runs on Colab and on a local GPU). It could be a build problem with PyTorch and CUDA? IMO it’s worth posting the full traceback on the PyTorch forums since it seems like the problem lies here!

The only other thing I think that’s worth trying is disabling fp16 and seeing if that changes the error (maybe an AMP backend problem?)

Hey @Sharonio! That’s really great to hear! We’ll definitely run more speech events in the future with whatever models/datasets are most exciting at the time! Feel free to ask if you have any questions regarding Whisper fine-tuning as we’re still super keen to help where possible

@sanchit-gandhi, thank you so much for following up, I appreciate it. I was able to solve the problem. The culprit was an older CUDA version. Thanks, again.

Hey @vkg! Awesome to hear that! Best of luck with your Whisper fine-tuning runs - very excited to see what you build!

Hello,

I am running into an issue with this,

As I start with the data preprocessing, it takes up most of my disk space on google colab and cannot continue on with setting up the rest of the process to actually train the data. I thought that I would save the preprocessed common voice dataset and download it locally then start a new google colab and upload the dataset to continue on with traning, but I can’t even do that since there is no disk space. Please I would appreciate you assistance on this,

Best
Layla

hi i hope you’re doing well i try to finetune whisper small english on common voice data as you did in blog but i can’t slove the error displaed when training after doing first epoch :
TypeError: argument of type ‘NoneType’ is not iterable
please i need a fast reply

here is the full exception after completing first epoch:
TypeError Traceback (most recent call last)
Cell In[35], line 1
----> 1 trainer.train()

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1555, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
1553 hf_hub_utils.enable_progress_bars()
1554 else:
→ 1555 return inner_training_loop(
1556 args=args,
1557 resume_from_checkpoint=resume_from_checkpoint,
1558 trial=trial,
1559 ignore_keys_for_eval=ignore_keys_for_eval,
1560 )

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:1922, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
1919 self.state.epoch = epoch + (step + 1 + steps_skipped) / steps_in_epoch
1920 self.control = self.callback_handler.on_step_end(args, self.state, self.control)
→ 1922 self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
1923 else:
1924 self.control = self.callback_handler.on_substep_end(args, self.state, self.control)

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:2271, in Trainer._maybe_log_save_evaluate(self, tr_loss, model, trial, epoch, ignore_keys_for_eval)
2269 metrics.update(dataset_metrics)
2270 else:
→ 2271 metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
2272 self._report_to_hp_search(trial, self.state.global_step, metrics)
2274 # Run delayed LR scheduler now that metrics are populated

File /usr/local/lib/python3.10/dist-packages/transformers/trainer_seq2seq.py:165, in Seq2SeqTrainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix, **gen_kwargs)
162 gen_kwargs[“num_beams”] = self.args.generation_num_beams
163 self._gen_kwargs = gen_kwargs
→ 165 return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:3011, in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
3008 start_time = time.time()
3010 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
→ 3011 output = eval_loop(
3012 eval_dataloader,
3013 description=“Evaluation”,
3014 # No point gathering the predictions if there are no metrics, otherwise we defer to
3015 # self.args.prediction_loss_only
3016 prediction_loss_only=True if self.compute_metrics is None else None,
3017 ignore_keys=ignore_keys,
3018 metric_key_prefix=metric_key_prefix,
3019 )
3021 total_batch_size = self.args.eval_batch_size * self.args.world_size
3022 if f"{metric_key_prefix}_jit_compilation_time" in output.metrics:

File /usr/local/lib/python3.10/dist-packages/transformers/trainer.py:3200, in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
3197 batch_size = observed_batch_size
3199 # Prediction step
→ 3200 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
3201 main_input_name = getattr(self.model, “main_input_name”, “input_ids”)
3202 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None

File /usr/local/lib/python3.10/dist-packages/transformers/trainer_seq2seq.py:266, in Seq2SeqTrainer.prediction_step(self, model, inputs, prediction_loss_only, ignore_keys, **gen_kwargs)
261 if not self.args.predict_with_generate or prediction_loss_only:
262 return super().prediction_step(
263 model, inputs, prediction_loss_only=prediction_loss_only, ignore_keys=ignore_keys
264 )
→ 266 has_labels = “labels” in inputs
267 inputs = self._prepare_inputs(inputs)
269 # Priority (handled in generate):
270 # non-None gen_kwargs > model.generation_config > default GenerationConfig()

TypeError: argument of type ‘NoneType’ is not iterable