Why does model gives me different result whenever I evaluate using device_map="auto"?

I’m sorry for duplicate(pytorch - Why does model gives me different result whenever I evaluate using device_map="auto"? - Stack Overflow) but I really want to figure out why this happens.

I’m running codes with 2 GPU(RTX 3090)s and using PyTorch docker container(pytorch:2.1.0-cuda12.1-cudnn8-runtime).
Since I’m new to pytorch and transformers stuff, I’ve decided to do some basic things, especially using LoRA.

So I have produced LoRA weights for roberta-base model. It is fine-tuned with GLUE dataset, specifically using MRPC task.
After fine-tuning is done, I wanted to evaluate model so I’ve loaded model and LoRA weights using RobertaForSequentialClassification and PeftModel.

For prediction, model is loaded with device_map="auto" and it gave me very unstable result. (Please see stackoverflow link.)
While struggling, I eventually found device_map affects to result. So I’ve tested device_map=0 and results is stable.

I’ve already read device_map='auto' gives bad results · Issue #20896 · huggingface/transformers · GitHub but ACS seems not my problem.