I struggle deploying inference endpoints. I tried to start a few:
TheBloke/Wizard-Vicuna-30B-Uncensored-GPTQ
s3nh/lmsys-vicuna-7b-v1.5-16k-GGML
PygmalionAI/pygmalion-6b
TheBloke/WizardLM-7B-uncensored-GPTQ
and all of them fail spectacularly. And I do not really know where to start with debugging and search for errors.
I always used the recommended/automatic model suggestion, and if no where suggested I used an accelerated instance because all of those models are either Text Completion or Conversational.
Common errors I encountered where justError: ShardCannotStart
Application startup failed. Exiting.
- And also a very general error which I cannot rephradse currently (already deleted the endpoints)
Maybe I am too blind to see the elephant in the room, or it is camouflaging itself very well.
Pretty thanks in advance and have a great day
Greetings qbin