Problem with pyannote.audio==3.1.0

Seems library version incompatibility…


Your import error comes from an API removal in torchaudio and an incompatible NumPy pin. Fix by upgrading pyannote.audio and undoing the NumPy downgrade. Keep your Torch 2.9 stack.

TL;DR fix

# clean conflicting pins
pip uninstall -y pyannote.audio pyannote.core pyannote.metrics pyannote.pipeline pyannote.database numpy

# install a compatible, modern set
pip install --upgrade "numpy>=2.3" "pyannote.audio>=4.0.1" --prefer-binary
# keep your existing torch==2.9.*, torchaudio==2.9.* and torchcodec

pyannote.audio>=4 removed the old torchaudio backend call and uses FFmpeg via torchcodec, so the import works on torchaudio≥2.2. NumPy≥2.x satisfies pyannote-core and pyannote-metrics. (GitHub)

Then restart the kernel once. Verify:

# refs:
# - torchaudio dispatcher notes: https://docs.pytorch.org/audio/main/torchaudio.html
# - pyannote model card: https://huggingface.co/pyannote/speaker-diarization-3.1
import torchaudio, torchcodec
print("backends:", torchaudio.list_audio_backends())  # should show 'ffmpeg' and/or 'soundfile'
from pyannote.audio import Pipeline
pipe = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token="hf_xxx")  # do not hardcode secrets

set_audio_backend was deprecated, then removed in torchaudio 2.2+, which is why pyannote.audio==3.1.0 fails to import on your current torchaudio. (PyTorch Docs)

Why your install failed

  • pyannote.audio==3.1.0 calls torchaudio.set_audio_backend("soundfile"). That function is gone in torchaudio≥2.2, so import raises AttributeError. Upgrading pyannote fixes it because 4.x removed that path. (GitHub)
  • You forced numpy==1.26. Current pyannote ecosystem components require NumPy≥2.0 (core) and ≥2.2.2 (metrics). Pip warned correctly. Use NumPy≥2.3. (GitHub)

If you must stay on pyannote.audio==3.1.0 (not recommended)

Pick one, not both:

# Legacy stack that still has set_audio_backend
pip install "torch<=2.1.2" "torchaudio<=2.1.2" "numpy>=2.0,<3" "pyannote.audio==3.1.0"

or a temporary shim:

# WARNING: local hack to import 3.1.0 with new torchaudio
import torchaudio
if not hasattr(torchaudio, "set_audio_backend"):
    torchaudio.set_audio_backend = lambda *a, **k: None
    torchaudio.get_audio_backend = lambda: "soundfile"
from pyannote.audio import Pipeline

The first aligns versions to when the API existed. The second bypasses the call so you can upgrade later. (PyTorch Docs)

Gating and FFmpeg checks

  • Accept the model terms for pyannote/speaker-diarization-3.1 on Hugging Face and pass a valid token, or downloads will fail. (Hugging Face)
  • pyannote.audio>=4 expects FFmpeg via torchcodec. You already verified FFmpeg and torchcodec, which matches the 4.x I/O design. (GitHub)

Sanity test end-to-end

# refs in comments:
# https://huggingface.co/pyannote/speaker-diarization-3.1
# https://docs.pytorch.org/audio/main/torchaudio.html
import torch
from pyannote.audio import Pipeline
pipe = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", token="hf_xxx")
if torch.cuda.is_available():
    pipe.to("cuda")
result = pipe("sample.wav")  # 16 kHz mono recommended
print(result)

The model card confirms “pyannote.audio version 3.1 or higher,” so using 4.x is valid and simpler on modern Torch. (Hugging Face)

Extra context and references

  • Torchaudio 2.2+ removed set_audio_backend and switched to a dispatcher. That is the precise cause of your AttributeError. (PyTorch Docs)
  • pyannote 4.x release notes: removed sox/soundfile backends; use FFmpeg or in-memory audio. Explains why 4.x works on Windows with torchcodec. (GitHub)
  • NumPy≥2 requirement in the pyannote stack. Avoid forcing 1.26. (GitHub)

Deleting the venv is optional. Uninstall→reinstall with the versions above and one kernel restart is sufficient.