Seems library version incompatibility…
Your import error comes from an API removal in torchaudio and an incompatible NumPy pin. Fix by upgrading pyannote.audio and undoing the NumPy downgrade. Keep your Torch 2.9 stack.
TL;DR fix
# clean conflicting pins
pip uninstall -y pyannote.audio pyannote.core pyannote.metrics pyannote.pipeline pyannote.database numpy
# install a compatible, modern set
pip install --upgrade "numpy>=2.3" "pyannote.audio>=4.0.1" --prefer-binary
# keep your existing torch==2.9.*, torchaudio==2.9.* and torchcodec
pyannote.audio>=4 removed the old torchaudio backend call and uses FFmpeg via torchcodec, so the import works on torchaudio≥2.2. NumPy≥2.x satisfies pyannote-core and pyannote-metrics. (GitHub)
Then restart the kernel once. Verify:
# refs:
# - torchaudio dispatcher notes: https://docs.pytorch.org/audio/main/torchaudio.html
# - pyannote model card: https://huggingface.co/pyannote/speaker-diarization-3.1
import torchaudio, torchcodec
print("backends:", torchaudio.list_audio_backends()) # should show 'ffmpeg' and/or 'soundfile'
from pyannote.audio import Pipeline
pipe = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token="hf_xxx") # do not hardcode secrets
set_audio_backend was deprecated, then removed in torchaudio 2.2+, which is why pyannote.audio==3.1.0 fails to import on your current torchaudio. (PyTorch Docs)
Why your install failed
pyannote.audio==3.1.0callstorchaudio.set_audio_backend("soundfile"). That function is gone in torchaudio≥2.2, so import raisesAttributeError. Upgrading pyannote fixes it because 4.x removed that path. (GitHub)- You forced
numpy==1.26. Current pyannote ecosystem components require NumPy≥2.0 (core) and ≥2.2.2 (metrics). Pip warned correctly. Use NumPy≥2.3. (GitHub)
If you must stay on pyannote.audio==3.1.0 (not recommended)
Pick one, not both:
# Legacy stack that still has set_audio_backend
pip install "torch<=2.1.2" "torchaudio<=2.1.2" "numpy>=2.0,<3" "pyannote.audio==3.1.0"
or a temporary shim:
# WARNING: local hack to import 3.1.0 with new torchaudio
import torchaudio
if not hasattr(torchaudio, "set_audio_backend"):
torchaudio.set_audio_backend = lambda *a, **k: None
torchaudio.get_audio_backend = lambda: "soundfile"
from pyannote.audio import Pipeline
The first aligns versions to when the API existed. The second bypasses the call so you can upgrade later. (PyTorch Docs)
Gating and FFmpeg checks
- Accept the model terms for
pyannote/speaker-diarization-3.1on Hugging Face and pass a valid token, or downloads will fail. (Hugging Face) pyannote.audio>=4expects FFmpeg viatorchcodec. You already verified FFmpeg andtorchcodec, which matches the 4.x I/O design. (GitHub)
Sanity test end-to-end
# refs in comments:
# https://huggingface.co/pyannote/speaker-diarization-3.1
# https://docs.pytorch.org/audio/main/torchaudio.html
import torch
from pyannote.audio import Pipeline
pipe = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token="hf_xxx")
if torch.cuda.is_available():
pipe.to("cuda")
result = pipe("sample.wav") # 16 kHz mono recommended
print(result)
The model card confirms “pyannote.audio version 3.1 or higher,” so using 4.x is valid and simpler on modern Torch. (Hugging Face)
Extra context and references
- Torchaudio 2.2+ removed
set_audio_backendand switched to a dispatcher. That is the precise cause of yourAttributeError. (PyTorch Docs) - pyannote 4.x release notes: removed
sox/soundfilebackends; use FFmpeg or in-memory audio. Explains why 4.x works on Windows withtorchcodec. (GitHub) - NumPy≥2 requirement in the pyannote stack. Avoid forcing 1.26. (GitHub)
Deleting the venv is optional. Uninstall→reinstall with the versions above and one kernel restart is sufficient.