Hi, I am trying to use the OpenAI GPT2 and I just realized that the hidden states change every time I run the model and I cannot figure out why. When I use BertModel this does not happen.
Does anyone have an explanation for that?
Thank you so much in advance!
Do you run the model in evaluation mode?
i.e. model.eval()
=> this will turn off any dropout modules
IIRC GPT-2 does sampling which is not token-level deterministic.