Okay, it’s my first post here and I am not a really experienced programmer so forgive me if I am asking something trivial :)) .
I am working on finetuning LLM’s on a downstream task, and I use Accelerate library to work on 2 GPUs (DDP setting). I can’t share the code rn, but my setup looks like the tutorial in run_native_loop
I am trying to set_seed with the transformers method, to have reproducible experiments, and I am altering the seed to have 3 different runs. The problem is that, the data are loaded in batches with the same way (meaning that the first batch is always the same, in both GPUs. After shuffling, in epoch 2 i get shuffled training samples but still I get the same ones, even when altering the seed. This is something i cannot explain. How is this even possible?
The paradox is that setting the seed works fine for things like adding new randomly initialized rows in the embedding layer (different seed=different initialization), but the data keeps loading in the same order.
Is this the way it is supposed to be? It seems really strange to me. Maybe accelerator creates the problem? or is it the way Dataloaders are wrapped for the DDP scenario.
To clarify, data is indeed shuffled before a new epoch. What drives me crazy is that it is still shuffled in the same way, I alter the seed and i get the same samples order in epoch 2.
You should pass do_sample=True in your generation config or in your .generate() call. Most models have it off by default, causing the generation to be deterministic (and ignoring parameters like temperature, top_k, etc).
With temperature=0.2, the relative weight of the most likely logits is massively increased, making generation almost deterministic. Even if there are no bugs in your script, it’s far from guaranteed that two different seeds produce different outputs with such low temperature
Well, thank you for your response but I don’t think this is the case. My code is only for training, while the abovementioned post was about the generate function (used at inference). The dataloader is the problem with my code, but i can’t see why.