meta-llama/Llama-2-7b-hf with SFT zero loss when I increase my batch size

Hello, When I try fine tuning llama2-7b model with SFT, I found zero loss when I set my train batch size as 16. However, When I change my batzh size into 2, training loss get proper values. What could be the reason? I have about 500 token lenghths data on training.