Error while trying to host finetuned model on inference endpoint

Hi,

I have model that was finetuned using unsloth, I’m trying to host the model on an inference endpoint, but i end up getting the following error when trying to initialise the model on a t4 GPU with 64 gb ram

8:50.284772Z","level":"INFO","fields":{"message":"Waiting for shard to gracefully shutdown"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]} {"timestamp":"2024-05-20T05:28:50.286334Z","level":"INFO","fields":{"message":"Terminating shard"},"target":"text_generation_launcher","span":{"rank":2,"name":"shard-manager"},"spans":[{"rank":2,"name":"shard-manager"}]} {"timestamp":"2024-05-20T05:28:50.286333Z","level":"INFO","fields":{"message":"Terminating shard"},"target":"text_generation_launcher","span":{"rank":3,"name":"shard-manager"},"spans":[{"rank":3,"name":"shard-manager"}]} {"timestamp":"2024-05-20T05:28:50.286883Z","level":"INFO","fields":{"message":"Waiting for shard to gracefully shutdown"},"target":"text_generation_launcher","span":{"rank":2,"name":"shard-manager"},"spans":[{"rank":2,"name":"shard-manager"}]} {"timestamp":"2024-05-20T05:28:50.286915Z","level":"INFO","fields":{"message":"Waiting for shard to gracefully shutdown"},"target":"text_generation_launcher","span":{"rank":3,"name":"shard-manager"},"spans":[{"rank":3,"name":"shard-manager"}]} {"timestamp":"2024-05-20T05:28:50.688669Z","level":"INFO","fields":{"message":"shard terminated"},"target":"text_generation_launcher","span":{"rank":3,"name":"shard-manager"},"spans":[{"rank":3,"name":"shard-manager"}]} {"timestamp":"2024-05-20T05:28:50.879857Z","level":"INFO","fields":{"message":"shard terminated"},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]} {"timestamp":"2024-05-20T05:28:50.886106Z","level":"INFO","fields":{"message":"shard terminated"},"target":"text_generation_launcher","span":{"rank":1,"name":"shard-manager"},"spans":[{"rank":1,"name":"shard-manager"}]} {"timestamp":"2024-05-20T05:28:51.088917Z","level":"INFO","fields":{"message":"shard terminated"},"target":"text_generation_launcher","span":{"rank":2,"name":"shard-manager"},"spans":[{"rank":2,"name":"shard-manager"}]} Error: WebserverFailed

here is a link to the repository

Is it due to erroneous settings in the inference endpoint?, or was the repository uploaded missing something? Any help would be appreciated.

Hi @Saran12 Can you please try using this model with an A10G or A100 instance instead of the T4? Let us know if you continue experiencing issues!

Yup it is working on A10G, thank you!

1 Like