Issue with TorchCodec when fine-tuning Whisper ASR model

junnyrong · October 21, 2025, 7:37am

Hello,

In the past I have been fine tuning the Whisper-tiny ASR model using these guides:

It was all working fine, I was able do everything locally like loading a pre-trained Whisper-tiny model and also my own dataset until recently when I updated the modules. I have been getting errors like these:

I have tried falling back and testing the samples provided by the guides and they also seem to have broke and started giving the same error. I also tried running them on Google Colab where it will crash when trying to run a cell like this:

I would like to know if anyone else is also facing the same issue and if there are any solutions for it. Thanks in advance!

John6666 · October 21, 2025, 8:37am

This error appears to stem from changes to the audio backend in the datasets library. The quickest workaround may be to install using pip install datasets==3.6.0. Additionally, if using version 4.0.0 or later, builder script-type datasets can no longer be used directly from the Hub. You will need to find and use datasets that have been converted to the standard type beforehand. If the original datasets were standard datasets, the latter issue should not be a problem.

Additionally, since Transformers underwent significant changes around version 4.49.0, if you encounter errors related to Whisper, rolling transformers back to version 4.48.3 or earlier would be the simplest workaround. Of course, rewriting for the new version is preferable… but for a temporary fix.

Your error started after upgrading to Datasets 4.x. 4.x switched audio decoding to TorchCodec, which loads FFmpeg at runtime and also requires a matching torch↔torchcodec pair. Accessing or printing an Audio column now triggers that decode path, so if FFmpeg is missing or versions don’t line up, you see the probe-and-fail chain (core7 → core6 → core5 → core4 ... Could not load torchcodec). On Windows this is more brittle, and early 4.0 notes even said Windows was not supported yet. (Hugging Face)

Why it broke now

Behavior change in Datasets 4.x: audio is decoded on access via TorchCodec + FFmpeg. Older 3.x used a different backend. Printing an example decodes it. (Hugging Face)
New runtime requirements: TorchCodec expects FFmpeg on the system and a compatible torch version. The README documents FFmpeg support and the torch↔torchcodec matrix. (GitHub)
Windows caveat: initial 4.0 release notes warned “not available for Windows yet; use datasets<4.0.” This explains why your previously working Windows setup started failing after upgrade. (GitHub)

Typical root causes

FFmpeg missing or wrong major. TorchCodec supports FFmpeg majors 4–7 on all platforms, with 8 only on macOS/Linux. Missing or mismatched DLLs yields your exact probe sequence. (GitHub)
Torch↔TorchCodec mismatch. Use the official matrix. Example: torchcodec 0.7 ↔ torch 2.8; 0.8 ↔ 2.9. (GitHub)
Fresh 4.0 regressions. Multiple reports show 3.x works then 4.x fails until TorchCodec+FFmpeg are added and versions pinned. (GitHub)

Fixes and workarounds

Pick one path. Keep it pinned.

A) Fastest unblock on Windows

# Downgrade Datasets to pre-TorchCodec behavior
pip install "datasets<4.0.0"  # release notes flagged Windows not ready
# https://github.com/huggingface/datasets/releases/tag/4.0.0

(GitHub)

B) Stay on Datasets 4.x and make it work

# Windows CPU: install FFmpeg and match versions
conda install -c conda-forge "ffmpeg<8"        # README recommends conda FFmpeg
pip install "torch==2.8.*" "torchcodec==0.7.*" # matrix: 0.7 <-> 2.8
# https://github.com/meta-pytorch/torchcodec#installing-torchcodec

If you need CUDA on Windows, use the experimental conda package:

conda install -c conda-forge "ffmpeg<8" "torchcodec=*=*cuda*"
# https://github.com/meta-pytorch/torchcodec#installing-cuda-enabled-torchcodec

(GitHub)

C) Linux or Colab

# Colab VM or Linux
apt-get update && apt-get install -y ffmpeg
pip install -U "datasets[audio]" "torch==2.8.*" "torchcodec==0.7.*"
# HF docs: audio decoding uses TorchCodec + FFmpeg
# https://huggingface.co/docs/datasets/en/audio_load

(Hugging Face)

D) Bypass decoding while you train

Avoid TorchCodec until your env is fixed.

from datasets import Audio
# Option 1: disable globally
ds = ds.decode(False)  # https://huggingface.co/docs/datasets/en/package_reference/main_classes#datasets.Dataset.decode
# Option 2: disable per column
ds = ds.cast_column("audio", Audio(decode=False))  # https://huggingface.co/docs/datasets/en/about_dataset_features

These return paths/bytes rather than decoded arrays, so printing items won’t invoke TorchCodec. (Hugging Face)

Sanity checks

python - <<'PY'
import subprocess, sys
import torch
print("python:", sys.version)
print("torch:", torch.__version__)
try:
    import torchcodec
    print("torchcodec:", torchcodec.__version__)
except Exception as e:
    print("torchcodec import failed:", e)
subprocess.run(["ffmpeg", "-hide_banner", "-version"])
PY
# Matrix and FFmpeg policy:
# https://github.com/meta-pytorch/torchcodec#installing-torchcodec

(GitHub)

Context from your linked thread

Your screenshots show Datasets 4.x decoding an Audio column, TorchCodec probing FFmpeg 7→6→5→4, then failing. That matches the new 4.x behavior and the FFmpeg/compatibility requirements above. (Hugging Face Forums)

Extra references and pitfalls

Release notes roundup: breaking changes, removal of scripts, and the Windows note. Useful if other 4.0 changes surfaced after your upgrade. (NewReleases)
Known mismatch/FFmpeg pitfalls: reports of brew-FFmpeg conflicts and version-mismatch guidance from TorchCodec maintainers. (GitHub)
PyTorch/Torchaudio migration: decoding is consolidating on TorchCodec (load_with_torchcodec exists as a bridge). Aligns your stack with where the ecosystem is going. (PyTorch Documentation)

junnyrong · October 22, 2025, 1:45am

I was pulling my hair thinking it has something to do with TorchCodec’s versioning, it never came to me that it might have been datasets! Thank you so much for the detailed explanation too, that solved my issue

system · October 22, 2025, 1:45pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Problems tracing fine tuned whisper model to torchscript Beginners	1	432	June 27, 2024
Converting finetuned Pytorch Whisper model to HF Beginners	0	698	January 15, 2023
Fine tuning whisper for ASR Beginners	0	449	July 13, 2023
How to finetune whisper model 🤗Transformers	0	579	May 7, 2023
Fine Tuning Whisper on my own Dataset with a customized Tokenizer Beginners	16	12703	February 12, 2024

Issue with TorchCodec when fine-tuning Whisper ASR model

Why it broke now

Typical root causes

Fixes and workarounds

A) Fastest unblock on Windows

B) Stay on Datasets 4.x and make it work

C) Linux or Colab

D) Bypass decoding while you train

Sanity checks

Context from your linked thread

Extra references and pitfalls

Related topics