Inference endpoint, gated repo 401 error

I’m using the inference endpoints website to host a fine tuned model. Initialization fails and tells me that I don’t have authorization to gated repos, yet I should have access.

I’ve tried mistral and llama fine tuned models. I’ve tried AWS and Google server options.

I do have an “access token” associated with my hugging face account so that I can pay for services like CPUs/GPUs, inference endpoint servers, etc.

image

Hi @RedFoxPanda In Inference Endpoints, you now have the ability to add an env variable to your endpoint, which is needed if you’re deploying a fine-tuned gated model like Meta-Llama-3-8B-Instruct.

We have some additional documentation on environment variables but the one you’d likely need is HF_TOKEN. You can add the HF_TOKEN as the key and your user access token as the value. User access tokens can be generated in the settings of your account.

Please let me know if you have additional questions!

This did work. I then had some type of error due to not having enough memory and I upgraded to a higher level gpu/cpu for the cloud computer. “Running” status is now present. Thanks.

@RedFoxPanda I’m glad to hear it! Thanks for letting me know. :hugs:

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.