Bad request error when using inference endpoints: Cannot find backend for CPU

dordonezc · June 16, 2024, 5:55pm

Hey,

I wrote a custom inference handler to deploy the phi-mini-128k model using langchain here dordonezc/Phi-3-small-128k-instruct-4-endpoints · Hugging Face

However, when I try to send a request to the endpoint I get the following error

BadRequestError: (Request ID: 7I0Wm6)
Bad request:
Cannot find backend for CPU

I am using an Nvidia T4 · 4x GPU · 64 GB image

I think something related to the flash attention is not working up correctly, does anyone know what might be happening?

Topic		Replies	Views
ERROR \| Expected a cuda device, but got: cpu Inference Endpoints on the Hub	1	953	January 1, 2024
Bad request: Task not found for this model Beginners	4	201	January 11, 2025
RuntimeError: The size of tensor a (48) must match the size of tensor b (64) at \nnon-singleton dimension 0"} Inference Endpoints on the Hub	1	93	April 29, 2025
Model won't load on custom inference endpoint Inference Endpoints on the Hub	2	361	June 13, 2024
RuntimeError on trying to create Inference Endpoint Inference Endpoints on the Hub	0	221	August 2, 2023