Two texts inputs for Text Classification in Inference API?
Hello!,
I’ve trained a model to classify if two texts have a relation between them. Therefore I need to introduce two independent strings as input.
As I’m using the tokenizer I can introduce two strings as input for the model using code, however I cannot reproduce that functionality in “Hosted Inference API”.
tokenizer = AutoTokenizer.from_pretrained("my-model")
model = AutoModelForSequenceClassification.from_pretrained("my-model")
input = tokenizer('I love you','I like you',return_tensors='pt')
model(**input)
When calling tokenizer with two separated strings the token type id distinguishes between text 1 and 2:
{'input_ids': tensor([[ 101, 1045, 2293, 2017, 102, 1045, 2066, 2017, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 1, 1, 1, 1]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}
However using just one string and [SEP] this is not the case, and the behaviour is the same as in “Inference API”
{'input_ids': tensor([[ 101, 1045, 2293, 2017, 101, 1045, 2066, 2017, 102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}
Would it be possible to add a second input text to “Hosted Inference API” in order to reproduce the same behaviour that code allows?
Thanks!