Inference Endpoint: `'NoneType' object is not callable`. No other context to go on [Pyannote speaker diarization]

I’m trying to create a pyannote Inference Endpoint with a custom inference handler.

I’m using HuggingFace’s example by @philschmid virtually as-is: philschmid/pyannote-speaker-diarization-endpoint · Hugging Face

I’ve run into 2 problems:

I created a project from scratch, heavily borrowing from @philschmid’s example, but adding pyannote-audio~=2.1 directly to requirements.txt (in his example he used a pyannote clone due to a HuggingFace requirement conflict that has since been fixed upstream).

I can run it locally just fine, and deploy it to HuggingFace just fine.
However when running on the cloud against any input I get the following error: ERROR | 'NoneType' object is not callable.

There is no other context whatsoever to go on.
This is the first line logged after the 3 logs apparently associated with the endpoint initialization:

bm7hp 2022-11-04T23:13:20.279Z 2022-11-04 23:13:20,279 | INFO | Initializing model from directory:/repository
bm7hp 2022-11-04T23:13:20.279Z 2022-11-04 23:13:20,279 | INFO | Found custom pipeline at /repository/handler.py
bm7hp 2022-11-04T23:13:21.392Z 2022-11-04 23:13:21,392 | INFO | Model initialized successfully
bm7hp 2022-11-04T23:16:08.468Z 2022-11-04 23:16:08,468 | ERROR | 'NoneType' object is not callable

I tried adding logging in the EndpointHandler __init__ and __call__, but these aren’t showing up. The error must be happening even prior to the __init__ getting called.

Just in case, I tried directly cloning @philschmid repo above.

I get an error on deploy, complaining that files on the pyannote/speaker-diarization repo cannot be accessed because they don’t exist or because I’m unauthorized. (I don’t have exact error available right now but can reproduce if needed).

I encountered similar errors while implementing (1), solved by approving pyannote’s terms of service on their repos. Not sure why these errs recur here.

Any pointers would be much appreciated.

Hello @ataiii

There is some hick-hup with pyannote.audio and huggingface_hub You can see more here: [BUG] parameter 'segmentation_onset' does not exist · Issue #1100 · pyannote/pyannote-audio · GitHub

This might be the reason for your issues. Can you share your repository?

I checked why the example is not longer working an it turns out that pyannote has added a gate to the repository: pyannote/speaker-diarization · Hugging Face.

I ll try to find time to adapt my example. In the meantime you could accept the gate and put the model into the same repository as the handler.py instead of loading it directly.

Hey @philschmid thanks for looking into this.

I updated the pipeline fetch to use my auth token, and now it seems to work!
Pipeline.from_pretrained("pyannote/speaker-diarization@2.1", use_auth_token="XXX")

Honestly I’m not sure why, because fetching the pipeline didn’t seem like the issue I was having in (1).
Perhaps some dependency was updated?

Anyways, it’s working now on my repo!

Hello @ataiii,

I fixed yesterday my philschmid/pyannote-speaker-diarization-endpoint · Hugging Face example

Hi @philschmid

I’m still getting an error on runtime (a new ones).

huggingface_diarization_result:  {"error":"CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)`"}

Getting this error both on a direct clone of your repo and on my own similar repo.

I deployed an instance off of my repo a few days ago as well (as I said above). That instance is still running and has no issues.
Redeploying same exact repo now consistently yields this error. Tried multiple cloud providers and machine types.

Seems to be some sort of configuration issue on Huggingface’s end?