Hi.
I’m a beginner at NLP.
I’m going to summarize sentences using T5 model’s information API.
The model is currently loading keeps popping up and does not proceed.
Can you tell me what this error is?
@Doogie
Hello
Inference API loads models on demand, so if it’s your first time using it in a while it will load the model first, then you can try to send the request again in couple of seconds.
wait_for_model is documented in the link shared above.
If false, you will get a 503 when it’s loading. If true, your process will hang waiting for the response, which might take a bit while the model is loading. You can pin models for instant loading (see Hugging Face – Pricing)
I get a message that wait_for_model is no longer valid
{‘inputs’: {‘past_user_inputs’: , ‘generated_responses’: , ‘text’: ‘yo’}, ‘parameters’: {‘min_length’: 1, ‘max_length’: 500, ‘repetition_penalty’: 50.0, ‘temperature’: 50.0, ‘use_cache’: True, ‘wait_for_model’: True}}
{“error”: “The following model_kwargs are not used by the model: [‘wait_for_model’] (note: typos in the generate arguments will also show up in this list)”}
Hi. With “togethercomputer/GPT-NeoXT-Chat-Base-20B” I’m using the “wait_for_model” parameter set to true, but I still have the “Model is currently loading”. Is it because the model is too big ?
I’ve set the wait_for_model parameter to True in the payload in the same way as @deseipel and it doesn’t work for me either. I don’t get a specific error about the request, I just get the usual 503 error in response: “Model is currently loading”.
It’s not that the ‘parameters’ dictionary needs to be called ‘options’; rather, there is a separate argument from ‘parameters’ called ‘options’ that specifically takes the ‘use_cache’ and ‘wait_for_model’ keys in the dict.