I am trying to fine-tune BERT for token classification under the low-resource setting, where I am aiming to use as few samples as possible (this setting is also called active learning). The pipeline is like following
- Step 1: Train on initial labeled training set
- Step 2: Repeat until the annotation budget is reached or performance is good enough
- Add annotated samples to training set based on some criterion
- Retrain the model on the previously saved checkpoints
I have tried the following two implementations
Trainer(...)every time in Step 2. However, this will cause the model to fine-tune from scratch and therefore seems to be incorrect.
Trainer(...)is initialized. However, from the doc, it seems that there is no way to update
train_datasetafter the class is initialized.
Could someone help me with this (two howevers). Thank you for any input!