Model output is cutoff

I am using the Hosted inference API to test out my text-to-text model but my model output is cutoff.

My local model, using the translation pipeline, returns a full sentence with 213 characters, but the model on HuggingFace Hub returns only the first 47 characters.

I also tried the Inference Endpoint, same thing.

Any idea why?

I figured out why. locally i was using ‘translation’ as task but on endpoint it’s using ‘text-to-text generation’.

1 Like

Are you using the widgets or sending requests? When using the widgets the model are using the default generate arguments, which have a „small“ max_new_length.

You can customize those arguments during inference by adding parameters to your request. For inference endpoints you can find the documentation here: Supported Transformers Tasks


1 Like

Yes @philschmid. That fixed it! Thanks!

I was reading this doc earlier: Detailed parameters , and couldn’t find the parameter. It will be great if we can consolidate the parameters and make a single doc.

1 Like