Update train_dataset after trainer is initialized

MrRobot · March 27, 2021, 1:16am

I am trying to fine-tune BERT for token classification under the low-resource setting, where I am aiming to use as few samples as possible (this setting is also called active learning). The pipeline is like following

Step 1: Train on initial labeled training set
Step 2: Repeat until the annotation budget is reached or performance is good enough
- Add annotated samples to training set based on some criterion
- Retrain the model on the previously saved checkpoints

I have tried the following two implementations

Initialize Trainer(...) every time in Step 2. However, this will cause the model to fine-tune from scratch and therefore seems to be incorrect.
Update train_dataset after the Trainer(...) is initialized. However, from the doc, it seems that there is no way to update train_dataset after the class is initialized.

Could someone help me with this (two howevers). Thank you for any input!

Topic		Replies	Views
Training over an already trained transformer model 🤗Transformers	3	882	January 8, 2023
How to implement early stopping in bert fine tuning for token classification Intermediate	0	815	February 24, 2024
How to continue BERT training 🤗Transformers	1	1341	March 4, 2022
Does the tokenization in BERT change after fine-tuning? Models	0	590	January 27, 2023
Does order of training data matter when fine-tuning a BERT or RoBERTa model? Beginners	0	442	August 31, 2022

Update train_dataset after trainer is initialized

Related topics