Difference in return sequence for Phi3 model

@RaushanTurganbay This is the code sample:

model = AutoModelForCausalLM.from_pretrained(
        pretrained_model_path,
        device_map="cuda",
        torch_dtype="auto",
        trust_remote_code=True,
        output_hidden_states=True,
        return_dict_in_generate=True
    )
    tokenizer = AutoTokenizer.from_pretrained(tokenizer_model_path)
    
    messages = [
        {"role": "user", "content": prompt},
    ]
    
    pipe = CustomPipeline(
        model=model,
        tokenizer=tokenizer,
    )

With return_dict_in_generate=True, i get the following output:

[                                                                                                                   
    {'role': 'user', 'content': 'Hello How are you?'},                                                              
    {                                                                                                               
        'role': 'assistant',                                                                                        
        'content': " Hello! I'm doing well. How about you? How can I help you today? Hello! I'm an AI, so I don't ha
What can I do for you today? Greetings! As an AI, I don't have personal experiences, but I'm fully operational and r
assistance you need. What's on your mind?"                                                                          
    }                                                                                                               
]                                                                                                                   

And when false:

You are not running the flash-attention implementation, expect numerical differences.                               
[{'role': 'user', 'content': 'Hello How are you?'}, {'role': 'assistant', 'content': " Hello! I'm doing well. How about you?