Cannot import Data Collator For PLM

Hello, I was testing the new added feature to transofrmers in 3.0 which is the ability to continue training trasnformers models ( XLNet in my case) but I’m getting this error:

2020-07-20 10:47:29.463663: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "run_language_modeling.py", line 29, in <module>
    from transformers import (
ImportError: cannot import name 'DataCollatorForPermutationLanguageModeling'

I made sure that I installed the latest version of the library

Hi @krannnN, this change is recent not available in the release, you’ll need to install from source

Thank you this fixed it!
I have one more question how I’m I suppose to split the raw data to train and test for this task ? the readme file for the language modeling example is a little bit outdated it doesn’t have any mention of xlnet, I know it’s added recently.

You’ll need to split the dataset yourself, you can just follow the 80-20 or 90-10% split depending on the number of examples in your dataset. So if your train files has 100 examples, then you can just take 20 examples from it and add them in the validation file. And you can pass the eval file using the eval_data_file argument. Hope this helps