I am trying the Inference Endpoint for TheBloke/Llama-2-70B-Chat-GPTQ and I am getting the following errors.
2023/11/14 07:47:31 ~ {"timestamp":"2023-11-14T12:47:31.273613Z","level":"ERROR","fields":{"message":"Shard 0 failed to start"},"target":"text_generation_launcher"}
2023/11/14 07:47:31 ~ {"timestamp":"2023-11-14T12:47:31.273635Z","level":"INFO","fields":{"message":"Shutting down shards"},"target":"text_generation_launcher"}
2023/11/14 07:47:31 ~ {"timestamp":"2023-11-14T12:47:31.278347Z","level":"ERROR","fields":{"message":"Shard complete standard error output:\n\n[W socket.cpp:601] [c10d] The client socket has failed to connect to [localhost]:29500 (errno: 99 - Cannot assign requested address).\nYou are using a model of type llama to instantiate a model of type . This is not supported for all configurations of models and can yield errors.\nTraceback (most recent call last):\n\n File
Any advice appreciated?