Hi @muellerzr ,
thanks for providing this useful statements. I am using the scenario:
python -m torchrun --nproc_per_node 2 train_xxx.py
which is basically derived from the nlp_example.py
All I actually changed is the tokenize function and the dataset. After starting the script execution,
the model gets downloaded and everything starts properly. nvidia-smi shows that both GPUs are at approx. 80% usage - so far so good.
What worries me now is the fact that the log outputs the things I have been outputting so far, e.g. the size of the dataset etc., twice:
2023-03-31 08:14:23.354 | DEBUG | __main__:get_dataloaders:79 - DatasetDict({
train: Dataset({
features: ['input_ids', 'attention_mask', 'labels'],
num_rows: 7500
})
test: Dataset({
features: ['input_ids', 'attention_mask', 'labels'],
num_rows: 2500
})
})
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
You're using a GPT2TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
My question would be how can I make sure that now not the training runs once on each GPU, but actually distributed?
Kind regards
Julian