Update train_dataset after trainer is initialized

I am trying to fine-tune BERT for token classification under the low-resource setting, where I am aiming to use as few samples as possible (this setting is also called active learning). The pipeline is like following

  • Step 1: Train on initial labeled training set
  • Step 2: Repeat until the annotation budget is reached or performance is good enough
    • Add annotated samples to training set based on some criterion
    • Retrain the model on the previously saved checkpoints

I have tried the following two implementations

  • Initialize Trainer(...) every time in Step 2. However, this will cause the model to fine-tune from scratch and therefore seems to be incorrect.
  • Update train_dataset after the Trainer(...) is initialized. However, from the doc, it seems that there is no way to update train_dataset after the class is initialized.

Could someone help me with this (two howevers). Thank you for any input!