Hello huggingface community,
Just wanted to start by saying I am infinitely grateful to everything you have created!
I am a beginner with basic/intermediate level understanding of python and just started using transformers two days ago. I am facing a text classification problem with french datafor which I’am using camembert-base as the pre-trained model.
This is my dataset:
DatasetDict({
train: Dataset({
features: [‘text’, ‘label’],
num_rows: 85021
})
test: Dataset({
features: [‘text’, ‘label’],
num_rows: 15004
})
})
and its features:
{‘label’: ClassLabel(num_classes=20, names=[‘01. AGRI’, ‘02. ALIM’, ‘03. CHEMFER’, ‘04. ATEX’, ‘05. MACH’, ‘06. MARNAV’, ‘07. CONST’, ‘08. MINES’, “09. DOM”, ‘10. TRAN’, ‘11. ARARTILL’, ‘12. PREELEC’, ‘13. CER’, ‘14. ACHIMI’, ‘15. ECLA’, ‘16. HABI’, ‘17. ANDUS’, ‘18. ARBU’, ‘19. CHIRUR’, ‘20. ARPA’], id=None),
‘text’: Value(dtype=‘string’, id=None)}
My TrainingArguments:
training_args = TrainingArguments(
output_dir=‘./results’,
num_train_epochs=10,
per_device_train_batch_size=8,
per_device_eval_batch_size=16,
warmup_steps=500,
weight_decay=0.01,
logging_dir=‘./logs’,
logging_steps=10,
)
My Trainer:
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_datasets[“train”],
eval_dataset=tokenized_datasets[“test”],
data_collator=data_collator,
tokenizer=tokenizer,
compute_metrics=compute_metrics,
)
what my .train() is showing:
***** Running training *****
Num examples = 85021
Num Epochs = 3
Instantaneous batch size per device = 8
Total train batch size (w. parallel, distributed & accumulation) = 8
Gradient Accumulation steps = 1
Total optimization steps = 31884|Epoch|Training Loss|Validation Loss|Accuracy|
|1|0.994300|0.972638|0.711610|
|2|0.825400|0.879027|0.736337|
|3|0.660800|0.893457|0.744401|
I would like to continue training beyond the 3 epochs to increase my accuracy and continue to decrease training and validation loss. Am I missing something here?