The text I am generating from a prompt is less than 80 words even with parameters. What am I doing wrong?
const response = await fetch(
"https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct",
{
headers: {
Authorization: `Bearer ${hfToken}`,
"Content-Type": "application/json",
},
method: "POST",
body: JSON.stringify(data),
}
);
const result = await response.json();
return result;
}
query({"inputs": "What did the US markets looked in august 2024?", parameters: {
"max_length": 200}}).then((response) => {
console.log(JSON.stringify(response));
});```
1 Like
I followed this solution How to set minimum length of generated text in hosted API - #2 by Narsil. But this is from 2021, has there been any change in the method. I may have missed it in the documentation if it exists, but I didn’t see anything so far.
1 Like
“max_length”: 200
Not sure if it’s because of this option, but isn’t 200 too short? If you’re having trouble with the content, you’ll usually get better results if you don’t specify the option. Also, changing the model may work fine.
I set to 1 and then set it to 2000, there was no difference in length in either case.
Maybe it’s the max_tokens you should be setting?
Well, I can read JavaScript at least, but I’m not used to it. Try to follow this newish sample, which is very different even from 2023 and 2024, with OpenAI compatibility and the recommendation of InferenceClient / AsyncInferenceClient over simple POST requests.
Okay, so I did found something that worked (sort of).
query({"inputs": "What did the US markets looked in august 2018?", parameters: {
max_new_tokens: 400, return_full_text : true}}).then((response) => {
console.log(JSON.stringify(response));
I added the return_full_text : true
and I started getting longer generated texts however, if I increase the max_new_tokens: 400
to anything above 400, it gives the error,
{"error":"Model too busy, unable to get response in less than 60 second(s)"}
.
So this is where I am now.
Also I tried using different models, some worked but some of them were cold, so therefore I added wait-for-model.
Authorization: `Bearer ${hfToken}`,
"Content-Type": "application/json",
"x-wait-for-model": "true"
},
But the model never loaded, after some time I got a timeout error.
Honestly a breeze working with AI. /s
1 Like
As you know, the models are much bigger this year than last year. Therefore, many of them are no longer usable because the servers and internal programs for the Serverless Inference API have not been able to keep up. Now, if it says “warm” on it, it’s just about usable.
For commercial services, I guess you can use the Endpoint API, but for personal hobbies, it’s a lot of work.
We have to kind of, well, search by hand for a configuration that the server will answer to.
2 Likes