Different parameters between JSON inference and Inference API


I’m doing inferences with a T5 model (Text2text-generation task) through JSON inference and Inference API.

I noticed that I have the possibility to use the parameters num_beams and max_length within JSON inference, but not within Inference API (here, the Inference API parameters for Text-generation task that are the same as for Text2text-generation task). Strange…

Am I right? If yes, there will be an actualization of the Inference API parameters?

JSON inference

(I can use the parameters num_beams and max_length)

import json
import requests

API_URL = "https://api-inference.huggingface.co/models/" + model_name
headers = {"Authorization": f"Bearer {API_TOKEN}"}

def query(payload):
    data = json.dumps(payload)
    response = requests.request("POST", API_URL, headers=headers, data=data)
    return json.loads(response.content.decode("utf-8")), response.headers.get('x-compute-type')

# get inference
data, x_compute_type = query(
        "inputs": input_text,
        "parameters": {
            "num_beams": num_beams,
            "num_return_sequences": num_return_sequences,
            "max_length": max_target_length,

Inference API

(I can not use the parameters num_beams and max_length)

!pip install huggingface_hub

from huggingface_hub.inference_api import InferenceApi
inference = InferenceApi(repo_id=model_name, token=API_TOKEN)

params = {
    # "num_beams": num_beams,
    "num_return_sequences": num_return_sequences,
    #"max_length" : max_target_length,

inference(input_text, params)