Bad request error when using inference endpoints: Cannot find backend for CPU

Hey,

I wrote a custom inference handler to deploy the phi-mini-128k model using langchain here dordonezc/Phi-3-small-128k-instruct-4-endpoints · Hugging Face

However, when I try to send a request to the endpoint I get the following error

BadRequestError: (Request ID: 7I0Wm6)
Bad request:
Cannot find backend for CPU

I am using an Nvidia T4 · 4x GPU · 64 GB image

I think something related to the flash attention is not working up correctly, does anyone know what might be happening?