How to use Pipeline with re-ranker model and ORTForSequenceClassification

Hi all,

I am stucked with the utilization of pipeline with re-ranker “cross-encoder/ms-marco-MiniLM-L-6-v2” model. I know that they fall into the “text-classification” models, but don’t manage to specify the 2 list inputs query and paragraph for inference within the pipeline object.

Hi @Matthieu ,

Thanks for reporting! Right, the Optimum documentation misses a reference to the transformers one for the pipeline usage. Here the relevant documentation is: Pipelines

Here is a sample code:

from transformers import pipeline as tf_pipeline

# apply "none" to get logits as output
pipe = tf_pipeline(task="text-classification", model="cross-encoder/ms-marco-MiniLM-L-6-v2", function_to_apply="none")

##

res = pipe({"text": "How many people live in Berlin?", "text_pair": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."})
print(res)

res = pipe({"text": "How many people live in Berlin?", "text_pair": "New York City is famous for the Metropolitan Museum of Art."})

print(res)

##
from optimum.pipelines import pipeline as ort_pipeline

from optimum.onnxruntime import ORTModelForSequenceClassification

ort_model = ORTModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2", from_transformers=True)
ort_pipe = ort_pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, function_to_apply="none")

##
res = ort_pipe({"text": "How many people live in Berlin?", "text_pair": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."})
print(res)

res = ort_pipe({"text": "How many people live in Berlin?", "text_pair": "New York City is famous for the Metropolitan Museum of Art."})

print(res)

Something I think is missing for tasks as MS Marco Passage Ranking that cross-encoder/ms-marco-MiniLM-L-6-v2 deal with, is that the tokenizer expects a text and text_pair of same length. Therefore it is AFAIK not possible to use a single query with many passages, or you can find a workaround as the example in cross-encoder/ms-marco-MiniLM-L-6-v2 · Hugging Face , but it’s very inefficient.

If you are looking to do batch size = 1 inference, one query one passage, you are very fine though!