Hi
I have fine tuned a modal Flux.1 Dev from replicate locataco toolkit while training I also upload it on hugging face Now my question is how I can use my this fine tuned modal via hugging face API
If you know the solution Please let me know it would be great help to me Thanks
1 Like
Hi,
Great to hear about your fine-tuned model! If you’ve uploaded your fine-tuned Flux.1 Dev model to Hugging Face, you can use it via the Hugging Face API with these steps:
1. Ensure the Model is Uploaded Properly
- Confirm your model is publicly or privately hosted on the Hugging Face Hub.
- Navigate to the model page (e.g.,
https://huggingface.co/username/model-name
) and ensure it’s visible and correctly uploaded.
2. Install the Required Libraries
Ensure you have the Hugging Face transformers
and huggingface_hub
libraries installed:
pip install transformers huggingface_hub
3. Authenticate with Hugging Face
If your model is private, generate a personal access token on Hugging Face (from your account settings) and log in via:
huggingface-cli login
4. Load the Model and Tokenizer
Here’s an example to load and use your model:
from transformers import AutoModel, AutoTokenizer
# Replace 'username/model-name' with your model path on Hugging Face
model_name = "username/model-name"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Example input text
input_text = "Your input here"
# Tokenize input
inputs = tokenizer(input_text, return_tensors="pt")
# Generate output (modify as per your model's purpose, e.g., classification, generation)
outputs = model(**inputs)
print(outputs)
5. Using the Inference API
Alternatively, if you want to directly use the Hugging Face Inference API:
- Go to your model’s page on Hugging Face and copy the API endpoint.
- Use the following code:
import requests
API_URL = "https://api-inference.huggingface.co/models/username/model-name"
headers = {"Authorization": "Bearer YOUR_HUGGING_FACE_API_TOKEN"}
data = {"inputs": "Your input here"}
response = requests.post(API_URL, headers=headers, json=data)
print(response.json())
Notes
- Adjust the example based on the specific use case of your model (e.g., text generation, classification, etc.).
- If your model uses a custom architecture, you may need additional libraries from the Replicate Locataco toolkit.
I hope this helps! Let me know if you have further questions.
My answer lasted long cause I don’t know your exact situation.
Hope this help!
1 Like
Thanks for Replying My API end points are not available
I see this
Inference API
Unable to determine this model’s library. Check the docs .
I see this for inference API
Bad request:
Task not found for this model
Please can you help me with how I can create inference serverless API for my fine tuned modal I want to use it via huggingface API I used replicate to fine modal
1 Like
Thanks Dear It resolved I just need to update ReadMe.MD file is done Working Noe perfectly fine with inference API Thanks.
1 Like
This should be fine for FLUX.
---
library_name: diffusers
pipeline_tag: text-to-image
---
1 Like
Thanks a lot it was working absolutly fine but Now I am seeing Time out error on Modal card page even when I click on compute button
1 Like
If the symbol is anything other than yellow (Warm status), it basically won’t work. I think it will work if you use Endpoint or Gradio…
It seems that the shared GPU resources have run out.
I think it’s either in a cold or frozen status. In that state, it’s basically unusable from the web. That’s usually the case with personal models.
1 Like
Any solution to this issue Please
If the model size is less than 10GB, there is a method of calling it from Gradio’s load(). If you don’t know much about Python, you can create it in the space below. However, since FLUX is at least 30GB or more, there is virtually no way.
There is also a way to use the paid Endpoint API, but I don’t know much about it… Of course, it is possible to read the HF storage and execute the model from a local PC or cloud service with a powerful GPU.
Edit:
In the case of FLUX LoRA, the file size of the LoRA is used to determine, so the above method can be used if the base model is set appropriately in README.md. The FLUX model itself is too large, so it is not possible.
---
base_model: black-forest-labs/FLUX.1-dev
---
Thanks john6666 You helped me lot Now speed get better Also I get Warm and its sign as well is it possible to use modal that in private mode or in Acess request mode Because I have fine tuned modal on my images and I want to make it private so no body else can use it Never Ever! My preference is to use it via serverless inference API that is huggingface API
Thanks Again
1 Like
(After degradation) It is almost impossible to make a model for general users warm status using the Serverless Inference API. It will only be warm if it is one of the most famous models. LoRA and smaller models are relatively easy to warm, but it is difficult to do so on purpose.
In particular, I think that models that exceed 10GB, such as FLUX, will not become Warm unless they are individually approved by HF.
And as far as I know, there is no contact point for applying for this. Even if it is paid for. (If you pay for Endpoint, you can use it, but it does not turn on the Serverless Inference API.) Even if it is maintained, it will be in the future.
Anyway, I think it is impossible at the moment…
I understand can you please tell me how I can use a modal that is in private mode via inference API my modal is working I just want to make it private.
1 Like
If the model itself is working with the Inference API, you can make the model private and then pass your Read token when calling the Inference API. It is important to note that there are different types of tokens, and if you are using the default fine-grained token, you will need to manually turn on the Inference-related flag. Using a Read token is easier.
Thanks Johnn You are so helpful I am very thankful!
1 Like