T5 models have non-deterministic outputs even after disabling dropout

jiaweihuang · September 13, 2024, 2:47pm

I observe that the T5 models have different forward outputs given the same input information, even after disabling the dropout layers. I’m curious is there any other randomness in the forward process except dropout?

(P.S. The above issue occurs when open the train mode by running model.train(). Obviously, if I run model.eval(), the outputs are all the same.)

The following scripts can reproduce the results:

from trl.trainer.utils import disable_dropout_in_model
from datasets import load_dataset

tokenizer = T5Tokenizer.from_pretrained('google-t5/t5-base')
model = T5ForConditionalGeneration.from_pretrained('google-t5/t5-base')

# here I use xsum dataset for summarization
ds = load_dataset('EdinburghNLP/xsum', split='train')
prompt = "Summarize: " + ds[0]['document']

tokenized_dataset = tokenizer(prompt, truncation=True, padding='max_length', max_length=1024, return_tensors='pt')
source_ids = tokenized_dataset['input_ids']
source_mask = tokenized_dataset['attention_mask']

eos_token_id = [tokenizer.eos_token_id]

# open the train mode but disable the dropout
model.train()
disable_dropout_in_model(model)

# generate a random response
outputs = model.generate(input_ids=source_ids, 
                        attention_mask=source_mask, max_length=256, 
                        num_return_sequences=1, do_sample=True, eos_token_id=eos_token_id, temperature=1.0, num_beams=1, 
                        return_dict_in_generate=True,
                        output_scores=True)

# forward model twice with the same inputs
model_forward_1 = model(
    input_ids=source_ids,
    attention_mask=source_mask,
    labels=outputs.sequences,
    return_dict=True,
)
model_forward_2 = model(
    input_ids=source_ids,
    attention_mask=source_mask,
    labels=outputs.sequences,
    return_dict=True,
)

# print and compare the logits in two outputs, you will find they are different.
print(model_forward_1['logits'])
print(model_forward_2['logits'])

mahmutc · September 13, 2024, 10:32pm

hi @jiaweihuang
Because __call__ function(actually _call_impl) from nn.Module uses a random seed each time. Sorry I couldn’t find the exact line but here’s a reference:

github.com

pytorch/pytorch/blob/d69c22dd61a2f006dcfe1e3ea8468a3ecaf931aa/torch/nn/modules/module.py#L1045


      
                      tracing_state.push_scope(name)
                  else:
                      recording_scopes = False
              try:
                  result = self.forward(*input, **kwargs)
              finally:
                  if recording_scopes:
                      tracing_state.pop_scope()
              return result
          
          def _call_impl(self, *input, **kwargs):
              forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.forward)
              # If we don't have any hooks, we want to skip the rest of the logic in
              # this function, and just call forward.
              if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
                      or _global_forward_hooks or _global_forward_pre_hooks):
                  return forward_call(*input, **kwargs)
              # Do not call functions when jit is used
              full_backward_hooks, non_full_backward_hooks = [], []
              if self._backward_hooks or _global_backward_hooks:
                  full_backward_hooks, non_full_backward_hooks = self._get_backward_hooks()

Try this:

import torch

....

torch.manual_seed(0)

# forward model twice with the same inputs
model_forward_1 = model(
    input_ids=source_ids,
    attention_mask=source_mask,
    labels=outputs.sequences,
    return_dict=True,
)

torch.manual_seed(0)

model_forward_2 = model(
    input_ids=source_ids,
    attention_mask=source_mask,
    labels=outputs.sequences,
    return_dict=True,
)

# print and compare the logits in two outputs, you will find they are different.
print(model_forward_1['logits'])
print(model_forward_2['logits'])

nielsr · September 14, 2024, 8:08am

Hi,

this is because you’re passing do_sample=True to the generate() method, which enforces non-deterministic decoding.

Refer to How to generate text: using different decoding methods for language generation with Transformers for an overview of the different decoding methods. Greedy decoding and beam search are examples of deterministic decoding methods.

jiaweihuang · September 14, 2024, 8:14am

Thanks for the response. But I’m still a bit confused about which part of the model.forward has randomness. Given that I have disabled all the dropout layers, even with different random seeds, the output should be the same because each step of the inference is deterministic?

Or maybe I missed something, there is some random mask layers in T5?

jiaweihuang · September 14, 2024, 8:18am

Hi, I do not think there is any relationship with the randomness in generate(). I just use generate() to get a valid response.

What I compared is the difference of the outputs when I call model.forward twice with the same inputs (although such inputs is returned by generate()).

nielsr · September 14, 2024, 8:21am

Hi,

The from_pretrained method puts a model in evaluation mode by default, disabling things like dropout. So there’s no need to disable those yourself.

The randomness comes solely from the do_sample=True argument, which samples a random token at each time step of the generation. If you don’t pass this argument, greedy decoding is used, which takes the token with the highest probability at each time step.

mahmutc · September 14, 2024, 7:53pm

But we have the same issue if we forward model with random tensors rather than outputs:

model_forward_1 = model(
    input_ids=source_ids,
    attention_mask=source_mask,
    labels=tensor([[1,2,3]]),
    return_dict=True,
)
model_forward_2 = model(
    input_ids=source_ids,
    attention_mask=source_mask,
    labels=tensor([[1,2,3]]),
    return_dict=True,
)

# print and compare the logits in two outputs, you will find they are different.
print(model_forward_1['logits'])
print(model_forward_2['logits'])

nielsr · September 14, 2024, 8:08pm

I’m not able to reproduce this. The following passes for me:

from transformers import T5Tokenizer, T5ForConditionalGeneration
import torch

tokenizer = T5Tokenizer.from_pretrained('google-t5/t5-base')
model = T5ForConditionalGeneration.from_pretrained('google-t5/t5-base')

inputs = tokenizer("translate English to German: Hugging Face is a technology company based in New York and Paris", return_tensors="pt")

# forward model twice with the same inputs
model_forward_1 = model(
    **inputs,
    labels=torch.tensor([[1,2,3]]),
)
model_forward_2 = model(
    **inputs,
    labels=torch.tensor([[1,2,3]]),
)

# print and compare the logits in two outputs, you will find they are different.
assert torch.allclose(model_forward_1['logits'], model_forward_2['logits'])

jiaweihuang · September 15, 2024, 6:53am

Hi, I guess I’m still not fully understand…

In my code, although I set do_sample is true, I only generate one output, which is fixed when I do the model.forward twice.
Besides, the logits returned by model.forward would related to the (unnormalized) log probability of tokens, which should be fixed if I infer with the fixed input.

So I do not think the randomness in the model.generation step will result in different outputs in later model.forward step.

mahmutc · September 15, 2024, 9:28am

indeed even torch.equal returns true. I don’t know how I got two different(but close) results last time, sorry.

Topic		Replies	Views
The output of T5 is not consistent on multiple sequences 🤗Transformers	1	867	May 11, 2022
T5 Model Generate and Model Outputs Vastly Different Beginners	1	815	September 11, 2022
T5 forward pass versus generate, latter outputs non-sense Beginners	8	2899	March 25, 2021
Untrained T5 model outputting logits that argmax to the decoder_input_ids Beginners	0	499	September 28, 2022
T5 Finetuning not converging Models	0	478	August 18, 2023

T5 models have non-deterministic outputs even after disabling dropout

Related topics