Inference Endpoints fail to start

I struggle deploying inference endpoints. I tried to start a few:

  • TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
  • s3nh/lmsys-vicuna-7b-v1.5-16k-GGML
  • PygmalionAI/pygmalion-6b
  • TheBloke/WizardLM-7B-uncensored-GPTQ
    and all of them fail spectacularly. And I do not really know where to start with debugging and search for errors.
    I always used the recommended/automatic model suggestion, and if no where suggested I used an accelerated instance because all of those models are either Text Completion or Conversational.
    Common errors I encountered where just
  • Error: ShardCannotStart
  • Application startup failed. Exiting.
  • And also a very general error which I cannot rephradse currently (already deleted the endpoints)

Maybe I am too blind to see the elephant in the room, or it is camouflaging itself very well.

Pretty thanks in advance and have a great day
Greetings qbin :smiley:

1 Like

I am not experienced with inference endpoints but many of those models are ggml and gptq. Ggml needs to work with some sort of cpp not transformers and gptq needs exllama or auto gptq

However, pygmalion should work since it’s a hf model but I’m not sure why it’s not working.