Use Fine Tuned Modal Via Hugging Face

asimservices · January 8, 2025, 10:46am

Hi
I have fine tuned a modal Flux.1 Dev from replicate locataco toolkit while training I also upload it on hugging face Now my question is how I can use my this fine tuned modal via hugging face API

If you know the solution Please let me know it would be great help to me Thanks

Alanturner2 · January 8, 2025, 1:19pm

Hi,

Great to hear about your fine-tuned model! If you’ve uploaded your fine-tuned Flux.1 Dev model to Hugging Face, you can use it via the Hugging Face API with these steps:

1. Ensure the Model is Uploaded Properly

Confirm your model is publicly or privately hosted on the Hugging Face Hub.
Navigate to the model page (e.g., https://huggingface.co/username/model-name) and ensure it’s visible and correctly uploaded.

2. Install the Required Libraries

Ensure you have the Hugging Face transformers and huggingface_hub libraries installed:

pip install transformers huggingface_hub

3. Authenticate with Hugging Face

If your model is private, generate a personal access token on Hugging Face (from your account settings) and log in via:

huggingface-cli login

4. Load the Model and Tokenizer

Here’s an example to load and use your model:

from transformers import AutoModel, AutoTokenizer  

# Replace 'username/model-name' with your model path on Hugging Face  
model_name = "username/model-name"  
tokenizer = AutoTokenizer.from_pretrained(model_name)  
model = AutoModel.from_pretrained(model_name)  

# Example input text  
input_text = "Your input here"  

# Tokenize input  
inputs = tokenizer(input_text, return_tensors="pt")  

# Generate output (modify as per your model's purpose, e.g., classification, generation)  
outputs = model(**inputs)  
print(outputs)

5. Using the Inference API

Alternatively, if you want to directly use the Hugging Face Inference API:

Go to your model’s page on Hugging Face and copy the API endpoint.
Use the following code:

import requests  

API_URL = "https://api-inference.huggingface.co/models/username/model-name"  
headers = {"Authorization": "Bearer YOUR_HUGGING_FACE_API_TOKEN"}  

data = {"inputs": "Your input here"}  

response = requests.post(API_URL, headers=headers, json=data)  
print(response.json())

Notes

Adjust the example based on the specific use case of your model (e.g., text generation, classification, etc.).
If your model uses a custom architecture, you may need additional libraries from the Replicate Locataco toolkit.

I hope this helps! Let me know if you have further questions.
My answer lasted long cause I don’t know your exact situation.
Hope this help!

asimservices · January 8, 2025, 3:57pm

Thanks for Replying My API end points are not available
I see this
Inference API

Unable to determine this model’s library. Check the docs .

I see this for inference API
Bad request:
Task not found for this model

Please can you help me with how I can create inference serverless API for my fine tuned modal I want to use it via huggingface API I used replicate to fine modal

asimservices · January 9, 2025, 3:20am

Thanks Dear It resolved I just need to update ReadMe.MD file is done Working Noe perfectly fine with inference API Thanks.

John6666 · January 9, 2025, 4:32am

This should be fine for FLUX.

---
library_name: diffusers
pipeline_tag: text-to-image
---

asimservices · January 9, 2025, 10:19am

Thanks a lot it was working absolutly fine but Now I am seeing Time out error on Modal card page even when I click on compute button

John6666 · January 9, 2025, 10:21am

If the symbol is anything other than yellow (Warm status), it basically won’t work. I think it will work if you use Endpoint or Gradio…
It seems that the shared GPU resources have run out.

John6666 · January 9, 2025, 10:26am

I think it’s either in a cold or frozen status. In that state, it’s basically unusable from the web. That’s usually the case with personal models.

asimservices · January 9, 2025, 10:27am

Any solution to this issue Please

asimservices · January 9, 2025, 10:52am

I found solution it work

John6666 · January 9, 2025, 10:55am

If the model size is less than 10GB, there is a method of calling it from Gradio’s load(). If you don’t know much about Python, you can create it in the space below. However, since FLUX is at least 30GB or more, there is virtually no way.
There is also a way to use the paid Endpoint API, but I don’t know much about it… Of course, it is possible to read the HF storage and execute the model from a local PC or cloud service with a powerful GPU.

Edit:
In the case of FLUX LoRA, the file size of the LoRA is used to determine, so the above method can be used if the base model is set appropriately in README.md. The FLUX model itself is too large, so it is not possible.

---
base_model: black-forest-labs/FLUX.1-dev
---

asimservices · January 10, 2025, 3:06am

Thanks john6666 You helped me lot Now speed get better Also I get Warm and its sign as well is it possible to use modal that in private mode or in Acess request mode Because I have fine tuned modal on my images and I want to make it private so no body else can use it Never Ever! My preference is to use it via serverless inference API that is huggingface API
Thanks Again

John6666 · January 10, 2025, 3:53am

(After degradation) It is almost impossible to make a model for general users warm status using the Serverless Inference API. It will only be warm if it is one of the most famous models. LoRA and smaller models are relatively easy to warm, but it is difficult to do so on purpose.
In particular, I think that models that exceed 10GB, such as FLUX, will not become Warm unless they are individually approved by HF.
And as far as I know, there is no contact point for applying for this. Even if it is paid for. (If you pay for Endpoint, you can use it, but it does not turn on the Serverless Inference API.) Even if it is maintained, it will be in the future.

Anyway, I think it is impossible at the moment…

asimservices · January 10, 2025, 4:10am

I understand can you please tell me how I can use a modal that is in private mode via inference API my modal is working I just want to make it private.

John6666 · January 10, 2025, 6:56am

If the model itself is working with the Inference API, you can make the model private and then pass your Read token when calling the Inference API. It is important to note that there are different types of tokens, and if you are using the default fine-grained token, you will need to manually turn on the Inference-related flag. Using a Read token is easier.

asimservices · January 11, 2025, 12:50pm

Thanks Johnn You are so helpful I am very thankful!

Topic		Replies	Views
Convert PyTorch Model to Hugging Face model Inference Endpoints on the Hub	0	925	March 5, 2024
How can I adapt this code to deploy it in HuggingFace? Beginners	0	240	September 10, 2023
Prakash Hinduja Switzerland (Swiss) How do I load a pre-trained model in Hugging Face? Beginners	1	23	June 26, 2025
Need help with pushing model to hugging face after fine tunning Beginners	0	195	April 25, 2024
Save, load and do inference with fine-tuned model Beginners	3	16371	March 8, 2024