Since --max_steps is needed when streaming is switched on -
If my training data has 10B tokens, seq_len or block_size of 1024, global batch size 128:
then for say 5 epochs is my calculation for max_steps, correct?
Calculation for max_steps:
- (1024 * 128) tokens per step.
- max_steps = (10B / (1024 * 128)) * 5
Hi @Palash123, that looks right to me. Can you specify the exact example you’re referring to in order to make sure of this?
Hi @regisss, this is regarding an example from language-modeling.
python run_clm.py \
--model_name_or_path gpt2 \
--dataset_name wikitext \
--dataset_config_name wikitext-2-raw-v1 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--do_train \
--output_dir /tmp/test-clm \
--gaudi_config_name Habana/gpt2 \
--use_habana \
--use_lazy_mode \
--use_hpu_graphs_for_inference \
--throughput_warmup_steps 3 \
--streaming \
--max_steps 1000 \
--do_eval
Thanks, so yeah your calculation looks right