ERROR | Expected a cuda device, but got: cpu

coreaiteam · December 22, 2023, 10:33am

I have trained a model and using the new feature of deploying the adapter model directory from repo on Huggingface but I am getting the following error when generating a response from the endpoint. However when I used the same handler.py in colab it is giving response.

ERROR | Expected a cuda device, but got: cpu
Along with a warning : UserWarning: Merge lora module to 4-bit linear may get different generations due to rounding errors.

fthor · January 1, 2024, 2:48pm

I have a similar issue when using inference endpoints.

I defined a Custom Handler following the doc, but when I try to load the model (llava) using bitsandbytes to quantize, it fails because the GPU is not found.

Before trying to setup an endpoint i played around with a space via gradio using the same hardware specs and everything worked fine.

To me it seems like the GPU is not present when the endpoint is initialized, but I could be wrong as I’m completely new with using the inference endpoints

Topic		Replies	Views
Model won't load on custom inference endpoint Inference Endpoints on the Hub	2	360	June 13, 2024
Errors running Inference Endpoint with quantized model Inference Endpoints on the Hub	2	794	September 14, 2023
Bad request error when using inference endpoints: Cannot find backend for CPU Inference Endpoints on the Hub	0	150	June 16, 2024
Endpoint issue with GPTQ Inference Endpoints on the Hub	0	219	January 23, 2024
Can i create endpoint using quantized model? Inference Endpoints on the Hub	3	722	January 16, 2024

ERROR | Expected a cuda device, but got: cpu

Related topics