[Open-to-the-community] Whisper fine-tuning event

Hey @pierreguillou! Sorry about the delay, just merged a PR that adds them: [WFTE] Add non-zero ref filtering step by sanchit-gandhi · Pull Request #87 · huggingface/community-events · GitHub

1 Like

@steja, @sanchit-gandhi, hello. I am having the same problem as @steja:
return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)

Is there a resolution to this?

For what it is worth, here is the output from transformers-cli and nvidia-smi:

$ transformers-cli env

  • transformers version: 4.26.0.dev0
  • Platform: Linux-4.18.0-372.26.1.el8_6.x86_64-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • Huggingface_hub version: 0.11.1
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes, A100
  • Using distributed or parallel set-up in script?: No

$ nvidia-smi

Wed Dec 21 21:23:10 2022
±----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------±---------------------±---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A10 On | 00000000:25:00.0 Off | 0 |
| 0% 29C P8 21W / 150W | 0MiB / 23028MiB | 0% E. Process |
| | | N/A |
±------------------------------±---------------------±---------------------+
| 1 NVIDIA A100 80G… On | 00000000:81:00.0 Off | 0 |
| N/A 26C P0 42W / 300W | 0MiB / 81920MiB | 0% E. Process |
| | | Disabled |
±------------------------------±---------------------±---------------------+

±----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
±----------------------------------------------------------------------------+

Any update will be appreciated. Thanks a lot.

Hey @vkg! Sorry for the late reply here! Have you implemented the additional filtering step outlined in
[Open-to-the-community] Whisper fine-tuning event - #21 by sanchit-gandhi? If so and the error still persists, could you try the workarounds listed in this PyTorch thread?
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)` - #13 by NicoHambauer - PyTorch Forums

Let me know if this continues to be an issue - more than happy to dig further on this!

@sanchit-gandhi, thank you for getting back to me.

Unfortunately, the first link in your response does not seem to be related to the problem I am facing, and the second link does not quite help.

So, this still remains an issue, I am afraid. Thanks again for your time.

The same code runs fine in the Google colab environment, just does not seem to work for my environment, the details of which are in my initial post.

1 Like

This is very cool! I’ve been working hard using Whisper on Hebrew, but I missed this specific event. Will there be a follow up?

@sanchit-gandhi Hi, seconding @vkg’s issue. The two resources you linked don’t address the core problem here: the whisper fine tuning demo you authored works fine in Colab, but returns the following error when run in a local notebook: RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasGemmEx( handle, opa, opb, m, n, k, &falpha, a, CUDA_R_16F, lda, b, CUDA_R_16F, ldb, &fbeta, c, CUDA_R_16F, ldc, CUDA_R_32F, CUBLAS_GEMM_DFALT_TENSOR_OP)
I have verified this isn’t an issue with GPU memory, and shouldn’t be a dimension mismatch as the dataset processing is identical to the demo. Could you please take a second look at this issue? Thank you.

Hey @avoy and @vkg!

I’ve re-run everything on a local GPU and it all works on my side: sanchit-gandhi/whisper-small-hi-rerun · Hugging Face

Based on the RuntimeError message alone I can’t pinpoint what it could be, but I don’t think it’s an error with the script (since it runs on Colab and on a local GPU). It could be a build problem with PyTorch and CUDA? IMO it’s worth posting the full traceback on the PyTorch forums since it seems like the problem lies here!

The only other thing I think that’s worth trying is disabling fp16 and seeing if that changes the error (maybe an AMP backend problem?)

Hey @Sharonio! That’s really great to hear! We’ll definitely run more speech events in the future with whatever models/datasets are most exciting at the time! Feel free to ask if you have any questions regarding Whisper fine-tuning as we’re still super keen to help where possible

@sanchit-gandhi, thank you so much for following up, I appreciate it. I was able to solve the problem. The culprit was an older CUDA version. Thanks, again.

Hey @vkg! Awesome to hear that! Best of luck with your Whisper fine-tuning runs - very excited to see what you build!