Pipelines for mutliple inputs don't produce reliable results

Jeremias · October 2, 2021, 10:40pm

I am using a text classification pipeline (‘sentiment-analysis’) with a fine-tuned ELECTRA model and transformers version 4.5.1
For some reason, calling the pipeline for a list of inputs will result in different outputs for each input than with applying the pipeline to each input! Why is this like that? I went through the patch notes but couldn’t see any fix for this issue, so I’m not sure if this still persists in recent versions.

Jeremias · October 2, 2021, 11:15pm

Okay, so I boiled this down to the issue not being related to pipelines, but to ELECTRA. ELECTRA changes its outputs for different batch sizes.

Jeremias · October 3, 2021, 1:32pm

I found out that this is not related to the transformers package, but is probably due to PyTorch optimisations with operations potentially happening in different orders depending on the input tensor - as float operations are inaccurate, this may lead to different results for the same inputs to a pipeline if combined with other input sentences. As this does not (only) depend on the shape of the tensor but also on the content, the only safe way to generate the exact same output is by applying the pipeline only for one input sentence one at a time.
For most use cases, class probabilities changing by values around 10^-6 doesn’t matter, but if you require exact results, be aware of this issue!

Topic		Replies	Views
Different results between pipeline and model() with multiple inputs 🤗Transformers	0	547	April 20, 2022
Slightly different output from trainer.predict and pipeline(..., function_to_apply="none") Beginners	1	502	June 21, 2023
Different outputs when using pipeline Intermediate	2	1232	July 20, 2023
Different sentiments when texts processed in batches vs singles Intermediate	1	447	July 3, 2022
Model results differ after creating pipeline with same model Beginners	2	863	September 30, 2020

Pipelines for mutliple inputs don't produce reliable results

Related topics