Different last hidden state output on different machines, same tokens

Hi,

I’m using the following code to load a pre trained module:

model_name = "samwit/koala-7b"
tokenizer = LlamaTokenizer.from_pretrained(model_name,)
base_model = LlamaForCausalLM.from_pretrained(model_name, 
                                        load_in_8bit=True,
                                        device_map='auto',
                                        output_hidden_states=True,)

And then using the following code, on 2 different machines:

input_ids = tokenizer("AAA", return_tensors="pt")
output = base_model(**input_ids)
output.hidden_states[-1].mean(dim=1).squeeze().tolist()[0]

On one machine I get: -0.328857421875, and on another I get: -0.28759765625.
The input_ids are the same, but the output gets different values.

Setting a manual_seed didn’t do any difference.

Does this make sense? shouldn’t this be deterministic?

Thank you