Regarding max steps, streaming in language modeling

Palash123 · April 11, 2024, 6:16pm

Since --max_steps is needed when streaming is switched on -
If my training data has 10B tokens, seq_len or block_size of 1024, global batch size 128:
then for say 5 epochs is my calculation for max_steps, correct?

Calculation for max_steps:

(1024 * 128) tokens per step.
max_steps = (10B / (1024 * 128)) * 5

regisss · April 12, 2024, 7:20am

Hi @Palash123, that looks right to me. Can you specify the exact example you’re referring to in order to make sure of this?

Palash123 · April 13, 2024, 10:27am

Hi @regisss, this is regarding an example from language-modeling.

python run_clm.py \
    --model_name_or_path gpt2 \
    --dataset_name wikitext \
    --dataset_config_name wikitext-2-raw-v1 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --do_train \
    --output_dir /tmp/test-clm \
    --gaudi_config_name Habana/gpt2 \
    --use_habana \
    --use_lazy_mode \
    --use_hpu_graphs_for_inference \
    --throughput_warmup_steps 3 \
    --streaming \
    --max_steps 1000 \
    --do_eval

regisss · April 13, 2024, 1:36pm

Thanks, so yeah your calculation looks right

Topic		Replies	Views
TrainingArguments class - max_steps formula when using streaming dataset 🤗Transformers	1	3742	September 14, 2023
Explicitly set number of training steps using Trainer 🤗Transformers	5	9556	September 16, 2020
How do you calculate max steps Beginners	2	2346	July 28, 2023
Max_step and generative dataset Intermediate	0	589	November 5, 2021
Understanding gpu usage huggingface classification - Total optimization steps Beginners	0	1305	March 26, 2022

Regarding max steps, streaming in language modeling

Related topics