I am using the Hosted inference API to test out my text-to-text model but my model output is cutoff.
My local model, using the translation pipeline, returns a full sentence with 213 characters, but the model on HuggingFace Hub returns only the first 47 characters.
I also tried the Inference Endpoint, same thing.
Any idea why?
I figured out why. locally i was using ‘translation’ as task but on endpoint it’s using ‘text-to-text generation’.
Are you using the widgets or sending requests? When using the widgets the model are using the default
generate arguments, which have a „small“ max_new_length.
You can customize those arguments during inference by adding parameters to your request. For inference endpoints you can find the documentation here: Supported Transformers Tasks
Yes @philschmid. That fixed it! Thanks!
I was reading this doc earlier: Detailed parameters , and couldn’t find the parameter. It will be great if we can consolidate the parameters and make a single doc.