Two texts inputs for Text Classification in Inference API?

Two texts inputs for Text Classification in Inference API?

Hello!,

I’ve trained a model to classify if two texts have a relation between them. Therefore I need to introduce two independent strings as input.
As I’m using the tokenizer I can introduce two strings as input for the model using code, however I cannot reproduce that functionality in “Hosted Inference API”.

tokenizer = AutoTokenizer.from_pretrained("my-model")
model = AutoModelForSequenceClassification.from_pretrained("my-model")

input = tokenizer('I love you','I like you',return_tensors='pt')
model(**input)

When calling tokenizer with two separated strings the token type id distinguishes between text 1 and 2:

{'input_ids': tensor([[ 101, 1045, 2293, 2017,  102, 1045, 2066, 2017,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 1, 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}

However using just one string and [SEP] this is not the case, and the behaviour is the same as in “Inference API”

{'input_ids': tensor([[ 101, 1045, 2293, 2017,  101, 1045, 2066, 2017,  102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}

Would it be possible to add a second input text to “Hosted Inference API” in order to reproduce the same behaviour that code allows?

Thanks!

1 Like

Hello,

You’re right that text classification hosted widget only takes one text and there are text classification models taking multiple text inputs. I opened an feature request to have a more flexible behavior. In the meanwhile if you want this for demonstration purposes, you can create a Space based on your model.