Trainer only doing 3 epochs no matter the TrainingArguments!

Youss · June 19, 2022, 4:03pm

Hello huggingface community,
Just wanted to start by saying I am infinitely grateful to everything you have created!

I am a beginner with basic/intermediate level understanding of python and just started using transformers two days ago. I am facing a text classification problem with french datafor which I’am using camembert-base as the pre-trained model.

This is my dataset:

DatasetDict({
train: Dataset({
features: [‘text’, ‘label’],
num_rows: 85021
})
test: Dataset({
features: [‘text’, ‘label’],
num_rows: 15004
})
})

and its features:

{‘label’: ClassLabel(num_classes=20, names=[‘01. AGRI’, ‘02. ALIM’, ‘03. CHEMFER’, ‘04. ATEX’, ‘05. MACH’, ‘06. MARNAV’, ‘07. CONST’, ‘08. MINES’, “09. DOM”, ‘10. TRAN’, ‘11. ARARTILL’, ‘12. PREELEC’, ‘13. CER’, ‘14. ACHIMI’, ‘15. ECLA’, ‘16. HABI’, ‘17. ANDUS’, ‘18. ARBU’, ‘19. CHIRUR’, ‘20. ARPA’], id=None),
‘text’: Value(dtype=‘string’, id=None)}

My TrainingArguments:

training_args = TrainingArguments(
output_dir=‘./results’,
num_train_epochs=10,
per_device_train_batch_size=8,
per_device_eval_batch_size=16,
warmup_steps=500,
weight_decay=0.01,
logging_dir=‘./logs’,
logging_steps=10,
)

My Trainer:

trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets[“train”],
eval_dataset=tokenized_datasets[“test”],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)

what my .train() is showing:

***** Running training *****
Num examples = 85021
Num Epochs = 3
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 8
Gradient Accumulation steps = 1
Total optimization steps = 31884

|Epoch|Training Loss|Validation Loss|Accuracy|
|1|0.994300|0.972638|0.711610|
|2|0.825400|0.879027|0.736337|
|3|0.660800|0.893457|0.744401|

I would like to continue training beyond the 3 epochs to increase my accuracy and continue to decrease training and validation loss. Am I missing something here?

Youss · June 20, 2022, 6:24am

Bump
I keep searching online and I couldn’t find anything directly related to this (that you cannot change the number of epochs), but on a different thread on stackoverflow someone said that number of epochs depend on your data? But my understanding here is that 1 epoch means that the model has been trained on the entire dataset once. So that doesnt make any sense to me.

I tried to fine tune a bert multilingual instead and still did just 3 epochs. Im thinking that this must be related to my data or my code. If anyone please could help

cog · June 20, 2022, 6:40am

hi @Youss ,

Does your code have an early stop setting?

Or have you checked if you explicitly specify epoch 3 in the whole code besides the trainer parameters?

regards.

Youss · June 20, 2022, 7:00am

Thank you kindly for replying @cog !

I dont see any such setting, I more or less followed the “fune-tune a pretrained model” notebook from huggingface. Here is essentially my code after preprocessing and splitting the dataset etc:

from transformers import AutoTokenizer, DataCollatorWithPadding
tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased")

def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True, max_length= 346)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
data_collator= DataCollatorWithPadding(tokenizer)

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-multilingual-cased", num_labels=20)

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=10,              # total number of training epochs
    per_device_train_batch_size=8,  # batch size per device during training
    per_device_eval_batch_size=16,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
  logging_steps=10,
)

### Metrics
from datasets import load_metric
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

### Trainer
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    )
trainer.train()
trainer.save_model('classification_trained_tuned_model-bert-multi')

cog · June 20, 2022, 8:41am

why training_args declare twice?

there trainer’s training_args maybe set with second one

training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

also default of num_train_epochs is 3.0.

that is why your train only itreate 3 epoch.

try this code.

from transformers import AutoTokenizer, DataCollatorWithPadding
tokenizer = AutoTokenizer.from_pretrained("bert-base-multilingual-cased")

def tokenize_function(examples):
    return tokenizer(examples["text"], truncation=True, padding=True, max_length= 346)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
data_collator= DataCollatorWithPadding(tokenizer)

from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-multilingual-cased", num_labels=20)

from transformers import TrainingArguments


## use this training_args to train
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=10,              # total number of training epochs
    per_device_train_batch_size=8,  # batch size per device during training
    per_device_eval_batch_size=16,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
  logging_steps=10,
)

### Metrics
from datasets import load_metric
metric = load_metric("accuracy")
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

### Trainer
from transformers import TrainingArguments, Trainer

# there set default like (num_train_epochs==3.0)
# training_args = TrainingArguments(output_dir="test_trainer", evaluation_strategy="epoch")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
    )
trainer.train()
trainer.save_model('classification_trained_tuned_model-bert-multi')

Youss · June 20, 2022, 2:47pm

@cog
Thank you! that was the solution. I cannot believe I missed that. I think because I was trying to follow the tutorial which was stretching over different pages and different codes.

Topic		Replies	Views
Text classifier is trained incorrectly using BERT transformers (f1 = 0) for a certain amount of dataset 🤗Transformers	2	828	August 31, 2023
Perform 1 Pretrain epoch on Pretrained model Beginners	0	361	July 5, 2022
Trainer epoch does not go through all training data? Beginners	4	3787	January 22, 2021
Training: "'Trainer' object has no attribute 'epoch'" 🤗Transformers	0	966	November 3, 2020
Fine-tuning multilingual BERT for sequence classification with Trainer API Beginners	7	659	December 12, 2023

Trainer only doing 3 epochs no matter the TrainingArguments!

Related topics