Two texts inputs for Text Classification in Inference API?

Two texts inputs for Text Classification in Inference API?

Hello!,

I’ve trained a model to classify if two texts have a relation between them. Therefore I need to introduce two independent strings as input.
As I’m using the tokenizer I can introduce two strings as input for the model using code, however I cannot reproduce that functionality in “Hosted Inference API”.

tokenizer = AutoTokenizer.from_pretrained("my-model")
model = AutoModelForSequenceClassification.from_pretrained("my-model")

input = tokenizer('I love you','I like you',return_tensors='pt')
model(**input)

When calling tokenizer with two separated strings the token type id distinguishes between text 1 and 2:

{'input_ids': tensor([[ 101, 1045, 2293, 2017,  102, 1045, 2066, 2017,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 1, 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}

However using just one string and [SEP] this is not the case, and the behaviour is the same as in “Inference API”

{'input_ids': tensor([[ 101, 1045, 2293, 2017,  101, 1045, 2066, 2017,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}

Would it be possible to add a second input text to “Hosted Inference API” in order to reproduce the same behaviour that code allows?

Thanks!

Hello,

You’re right that text classification hosted widget only takes one text and there are text classification models taking multiple text inputs. I opened an feature request to have a more flexible behavior. In the meanwhile if you want this for demonstration purposes, you can create a Space based on your model.