I am unable to adjust the generated text length

NoMoney03 · September 25, 2024, 12:28pm

The text I am generating from a prompt is less than 80 words even with parameters. What am I doing wrong?

	const response = await fetch(
		"https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct",
    
		{
			headers: {
				Authorization: `Bearer ${hfToken}`,
				"Content-Type": "application/json",
			},
			method: "POST",
			body: JSON.stringify(data),
      
		}
	);
	const result = await response.json();
	return result;
}


query({"inputs": "What did the US markets looked in august 2024?", parameters: { 
  "max_length": 200}}).then((response) => {
  console.log(JSON.stringify(response));
});```

NoMoney03 · September 25, 2024, 12:32pm

I followed this solution How to set minimum length of generated text in hosted API - #2 by Narsil. But this is from 2021, has there been any change in the method. I may have missed it in the documentation if it exists, but I didn’t see anything so far.

John6666 · September 25, 2024, 3:21pm

“max_length”: 200

Not sure if it’s because of this option, but isn’t 200 too short? If you’re having trouble with the content, you’ll usually get better results if you don’t specify the option. Also, changing the model may work fine.

NoMoney03 · September 25, 2024, 3:41pm

I set to 1 and then set it to 2000, there was no difference in length in either case.

John6666 · September 25, 2024, 3:45pm

Maybe it’s the max_tokens you should be setting?

NoMoney03 · September 25, 2024, 4:16pm

No, still did not work.

John6666 · September 25, 2024, 4:30pm

Well, I can read JavaScript at least, but I’m not used to it. Try to follow this newish sample, which is very different even from 2023 and 2024, with OpenAI compatibility and the recommendation of InferenceClient / AsyncInferenceClient over simple POST requests.

NoMoney03 · September 26, 2024, 11:16am

Okay, so I did found something that worked (sort of).

query({"inputs": "What did the US markets looked in august 2018?", parameters: { 
  max_new_tokens: 400, return_full_text : true}}).then((response) => {
  console.log(JSON.stringify(response));

I added the return_full_text : true and I started getting longer generated texts however, if I increase the max_new_tokens: 400 to anything above 400, it gives the error,
{"error":"Model too busy, unable to get response in less than 60 second(s)"}.

So this is where I am now.

Also I tried using different models, some worked but some of them were cold, so therefore I added wait-for-model.

				Authorization: `Bearer ${hfToken}`,
				"Content-Type": "application/json",
        "x-wait-for-model": "true"
			},

But the model never loaded, after some time I got a timeout error.

Honestly a breeze working with AI. /s

John6666 · September 26, 2024, 11:43am

As you know, the models are much bigger this year than last year. Therefore, many of them are no longer usable because the servers and internal programs for the Serverless Inference API have not been able to keep up. Now, if it says “warm” on it, it’s just about usable.
For commercial services, I guess you can use the Endpoint API, but for personal hobbies, it’s a lot of work.
We have to kind of, well, search by hand for a configuration that the server will answer to.

Topic		Replies	Views
How to set minimum length of generated text in hosted API Beginners	2	1595	March 10, 2021
How can I change the max_length of my own model in huggingface inference API? Inference Endpoints on the Hub	0	331	January 5, 2024
Text generation max length 🤗Hub	1	3095	October 15, 2023
EleutherAI/gpt-neo-2.7B Models	1	700	June 28, 2021
Change input_ids via API Inference for Text Generation Beginners	0	172	April 10, 2024

I am unable to adjust the generated text length

Related topics