Inference Endpoints fail to start

qbinary · August 3, 2023, 11:17am

I struggle deploying inference endpoints. I tried to start a few:

TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
s3nh/lmsys-vicuna-7b-v1.5-16k-GGML
PygmalionAI/pygmalion-6b
TheBloke/WizardLM-7B-uncensored-GPTQ
and all of them fail spectacularly. And I do not really know where to start with debugging and search for errors.
I always used the recommended/automatic model suggestion, and if no where suggested I used an accelerated instance because all of those models are either Text Completion or Conversational.
Common errors I encountered where just
Error: ShardCannotStart
Application startup failed. Exiting.
And also a very general error which I cannot rephradse currently (already deleted the endpoints)

Maybe I am too blind to see the elephant in the room, or it is camouflaging itself very well.

Pretty thanks in advance and have a great day
Greetings qbin

YaTharThShaRma999 · August 3, 2023, 7:53pm

I am not experienced with inference endpoints but many of those models are ggml and gptq. Ggml needs to work with some sort of cpp not transformers and gptq needs exllama or auto gptq

However, pygmalion should work since it’s a hf model but I’m not sure why it’s not working.

Topic		Replies	Views
Inference Endpoint Failure - Error: "Shard 0 failed to start" Inference Endpoints on the Hub	1	805	November 21, 2024
Cannot Setup Mixtral Models and Other Models on Inference Endpoints Inference Endpoints on the Hub	1	415	December 22, 2023
Server message:Endpoint failed to start. Endpoint failed Inference Endpoints on the Hub	0	257	January 20, 2024
Shard Cannot Start/Inference endpoint error while deployment Models	5	238	April 6, 2025
Endpoint issue with GPTQ Inference Endpoints on the Hub	0	222	January 23, 2024

Inference Endpoints fail to start

Related topics