I’m deploying my text-generation models using inference endpoints. It seems to me that the settings page has changed as of a day ago.
The old configuration looks like this:
The new configuration tab doesn’t allow for the setting of Max Input Length (per Query), Max Number of Tokens (per Query), Max Batch Prefill Tokens & Max Batch Total Tokens
How this can be set up in the new version of the page ?