I am using unsloth to fine tune a model and I wanted to confirm if when I specify max_steps to 1000, like in below code
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset["train"],
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
per_device_train_batch_size = 1,
gradient_accumulation_steps = 4,
warmup_steps = 5,
max_steps=1000,
learning_rate = 2e-4,
fp16 = not torch.cuda.is_bf16_supported(),
bf16 = torch.cuda.is_bf16_supported(),
logging_steps = 1,
optim = "adamw_8bit",
weight_decay = 0.01,
lr_scheduler_type = "linear",
seed = 3407,
output_dir = "outputs",
report_to = "none",
save_strategy = "steps",
save_steps = 500,
),
)
If after sometime I restart the session and and again initialize all the variables including dataset using the Dataset library and change the above code to
from trl import SFTTrainer
from transformers import TrainingArguments
trainer = SFTTrainer(
model = model,
tokenizer = tokenizer,
train_dataset = dataset["train"],
dataset_text_field = "text",
max_seq_length = max_seq_length,
dataset_num_proc = 2,
packing = False, # Can make training 5x faster for short sequences.
args = TrainingArguments(
...
max_steps=2000,
....
)
and run using trainer.train(resume_from_checkpoint=True)
, I can confirm that it indeed starts from 1001 step but I am not sure what data batch is it looking at. Does it start looking from the 1st data batch or does it looks correctly at 1001 data batch. Can someone please confirm this. Thanks in advance