Continue training XLNet on a specific closed-domain dataset

I’m wondering what if we want to leverage the already pre-trained XLNet model (and its language knowledge) and fine-tune on a specific closed-domain dataset, say legal domain for example.

I have already corpora I’m just missing how to do this with XLNet like models.

Any thoughts on how to do that?

Hi @krannnN, you can use the run_language_modelling script to fine-tune xlnet. You can fine it here. You’ll just need to provide the dataset in the required format.

thank you @valhalla, for your reply, the readme file doesn’t mention xlnet models.

export TRAIN_FILE=/path/to/dataset/wiki.train.raw
export TEST_FILE=/path/to/dataset/wiki.test.raw

python run_language_modeling.py \
    --output_dir=output \
    --model_type=xlnet\
    --model_name_or_path=xlnet \
    --do_train \
    --train_data_file=$TRAIN_FILE \
    --do_eval \
    --eval_data_file=$TEST_FILE 

is there anyway to precise that I want to continue training from the last checkpoint and not do the training from scratch ?

Thanks