Summariser pipeline giving different results on same model with fixed seed

AndreaSottana · August 17, 2022, 2:23pm

Hello,

I am using a summariser pipeline and I noticed that if I train a model for 3 epochs and then at the end run evaluation on all 3 epochs with fixed random seeds, I get a different results based on whether I restart the python console 3 times or whether I load the different model (one for every epoch) on the same summariser object in a loop, and I would like to understand why we have this behaviour.

While my results are based on ROUGE score on a large dataset, I have made this small reproducible example to show this issue. Instead of using the weights of the same model at different training epochs, I decided to demonstrate using two different summarization models, but the effect is the same. Grateful for any help.

Notice how in the first run I firstly use the facebook/bart-large-cnn model and then the lidiya/bart-large-xsum-samsum model without shutting the python terminal. In the second run I only use lidiya/bart-large-xsum-samsum model and get different output from the first run

NOTE: this reproducible example won’t work on a CPU machine as it doesn’t seem sensitive to torch.use_deterministic_algorithms(True) and it might give different results every time when run on a CPU.

FIRST RUN

from transformers import AutoTokenizer,  AutoModelForSeq2SeqLM, pipeline
import torch

text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""

seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-large-cnn")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-large-cnn")
model.eval()
summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)

output = summarizer(text, truncation=True)

tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)

output = summarizer(text, truncation=True)
print(output)

output should be

[{'summary_text': 'The UK economy is in crisis because of inflation. The government has been slow to react to it. Boris Johnson is on holiday.'}]

SECOND RUN (must restart python)

from transformers import AutoTokenizer,  AutoModelForSeq2SeqLM, pipeline
import torch

text = """
The veteran retailer Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation, describing the lack of action as “horrifying”, with a prime minister “on shore leave” leaving a situation where “nobody is in charge”.
Responding to July’s 10.1% headline rate, the Conservative peer and Asda chair said: “We have been very, very slow in recognising this train coming down the tunnel and it’s run quite a lot of people over and we now have to deal with the aftermath.”
Attacking a lack of leadership while Boris Johnson is away on holiday, he said: “We’ve got to have some action. The captain of the ship is on shore leave, right, nobody’s in charge at the moment.”
Lord Rose, who is a former boss of Marks & Spencer, said action was needed to kill “pernicious” inflation, which he said “erodes wealth over time”. He dismissed claims by the Tory leadership candidate Liz Truss’s camp that it would be possible for the UK to grow its way out of the crisis.
"""

seed = 42
torch.cuda.manual_seed_all(seed)
torch.use_deterministic_algorithms(True)

tokenizer = AutoTokenizer.from_pretrained("lidiya/bart-large-xsum-samsum")
model = AutoModelForSeq2SeqLM.from_pretrained("lidiya/bart-large-xsum-samsum")
model.eval()
summarizer = pipeline(
    "summarization", model=model, tokenizer=tokenizer, 
    num_beams=5, do_sample=True, no_repeat_ngram_size=3, device=0
)

output = summarizer(text, truncation=True)
print(output)

output should be

[{'summary_text': 'The government has been slow to deal with inflation. Stuart Rose has urged the government to do more to shield the poorest from double-digit inflation.'}]

Looks like the summarise in the first run keeps some memory of the facebook/bart-large-cnn model somehow. Why is this happening?

Topic		Replies	Views
Different Summary Outputs Locally vs API for the Same Text Amazon SageMaker	7	2056	December 6, 2021
Fixing the random seed in the Trainer does not produce the same results across runs 🤗Transformers	5	17641	March 27, 2025
Can not reproduce result when finetune BERT model 🤗Transformers	3	273	November 16, 2024
The model summarizes all the data to the same answer after training Models	0	248	July 2, 2023
Multiple training will give exactly the same result except for the first time 🤗Transformers	1	3568	July 19, 2021

Summariser pipeline giving different results on same model with fixed seed

Related topics