Hi all,
I am stucked with the utilization of pipeline with re-ranker “cross-encoder/ms-marco-MiniLM-L-6-v2” model. I know that they fall into the “text-classification” models, but don’t manage to specify the 2 list inputs query and paragraph for inference within the pipeline object.
Hi @Matthieu ,
Thanks for reporting! Right, the Optimum documentation misses a reference to the transformers one for the pipeline usage. Here the relevant documentation is: Pipelines
Here is a sample code:
from transformers import pipeline as tf_pipeline
# apply "none" to get logits as output
pipe = tf_pipeline(task="text-classification", model="cross-encoder/ms-marco-MiniLM-L-6-v2", function_to_apply="none")
##
res = pipe({"text": "How many people live in Berlin?", "text_pair": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."})
print(res)
res = pipe({"text": "How many people live in Berlin?", "text_pair": "New York City is famous for the Metropolitan Museum of Art."})
print(res)
##
from optimum.pipelines import pipeline as ort_pipeline
from optimum.onnxruntime import ORTModelForSequenceClassification
ort_model = ORTModelForSequenceClassification.from_pretrained("cross-encoder/ms-marco-MiniLM-L-6-v2", from_transformers=True)
ort_pipe = ort_pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, function_to_apply="none")
##
res = ort_pipe({"text": "How many people live in Berlin?", "text_pair": "Berlin has a population of 3,520,031 registered inhabitants in an area of 891.82 square kilometers."})
print(res)
res = ort_pipe({"text": "How many people live in Berlin?", "text_pair": "New York City is famous for the Metropolitan Museum of Art."})
print(res)
Something I think is missing for tasks as MS Marco Passage Ranking that cross-encoder/ms-marco-MiniLM-L-6-v2 deal with, is that the tokenizer expects a text
and text_pair
of same length. Therefore it is AFAIK not possible to use a single query with many passages, or you can find a workaround as the example in cross-encoder/ms-marco-MiniLM-L-6-v2 · Hugging Face , but it’s very inefficient.
If you are looking to do batch size = 1 inference, one query one passage, you are very fine though!