How to access path to audio recording in datasets 4.0?

John6666 · September 17, 2025, 9:40pm

The way audio datasets are handled seems to have changed quite a bit.

Short answer: cast the column with decode=False and read example["audio"]["path"]. The AudioDecoder object in Datasets 4.x doesn’t expose a file path; it only decodes samples. (PyTorch Documentation)

Minimal fixes

Local files (your case):

from datasets import Dataset, Audio

ds = Dataset.from_dict({"audio": ["path/to/audio_1", "path/to/audio_2"]})
ds = ds.cast_column("audio", Audio(decode=False))     # keep path/bytes
print(ds[0]["audio"]["path"])                         # "path/to/audio_1"

Docs show this exact pattern and return structure. (Hugging Face)

Keep path + still use decoders:

# 1) expose path
ds = ds.cast_column("audio", Audio(decode=False))
ds = ds.map(lambda ex: {"audio_path": ex["audio"]["path"]})

# 2) switch back to decoder objects for modeling
ds = ds.cast_column("audio", Audio())                 # now AudioDecoder
# audio_path column stays available

Behavior and recommendation to use decode=False to get path/bytes are documented. (Hugging Face)

Streaming datasets:

from datasets import load_dataset
ds = load_dataset("username/dataset", split="train", streaming=True).decode(False)
first = next(iter(ds))
print(first["audio"]["path"])                          # path or None if only bytes

.decode(False) disables feature decoding on streaming so you can iterate paths/bytes. (Hugging Face)

Notes

In v4, audio_dataset[0]["audio"] returns a TorchCodec AudioDecoder. Use .get_all_samples() for samples, but do not expect a path on that object. (Hugging Face)
Depending on the dataset, you may see a cache path or raw bytes when decoding is disabled. The docs show both possibilities. (Hugging Face)
v4 moved audio decoding from SoundFile to TorchCodec; release notes confirm the new AudioDecoder default and legacy indexing only for array and sampling rate. (GitHub)

Helpful refs: HF “Load audio data” and “Dataset features” pages and the v4.0 release notes. (Hugging Face)

Topic		Replies	Views
Create datasets object from multiple remote audio paths residing in Google Cloud Storage 🤗Datasets	2	376	June 28, 2022
Audio dataset without uploading the data to the hub 🤗Datasets	6	1974	March 20, 2023
Dataset loading script for an audio dataset 🤗Datasets	5	677	September 2, 2022
Custom dataset and cast_column 🤗Datasets	1	1449	April 7, 2022
Why is that I am not getting the full file path; thus unable to play the audio file 🤗Transformers	0	464	August 6, 2023

How to access path to audio recording in datasets 4.0?

Minimal fixes

Notes

Related topics