I’m trying to use the Inference API to fill in multiple words in a mask at once. I have done this in the past in Python with the T5 model, where you have to specify the maximum number of tokens that may fill in the mask:
outputs = t5_mlm.generate(input_ids=input_ids,
num_beams=200, num_return_sequences=20,
max_length=5)
But I don’t see any way to do that in the Inference API. I can generate single-word mask fills with bert-base-uncased:
import fetch from "node-fetch";
async function query(data) {
const response = await fetch(
"https://api-inference.huggingface.co/models/bert-base-uncased",
{
headers: { Authorization: `Bearer ${API_TOKEN}` },
method: "POST",
body: JSON.stringify(data),
}
);
const result = await response.json();
return result;
}
query({inputs:"The answer to the universe is [MASK]."}).then((response) => {
console.log(JSON.stringify(response));
});
But I don’t see how to put in max_length, if it is possible at all.
I also considered T0pp, which has an Inference API endpoint, but I can’t get it to generate anything that makes sense for filling in a mask at all.