Why Text Dataset For Next SentencePrediction get “Run out of input” error?

realliyifei · June 4, 2022, 4:43pm

I’m training BERT model using NSP (TextDatasetForNextSentencePrediction).

VOCAB_NAME = “bert-base-uncased"
MODEL_MAX_LEN = 512

tokenizer = AutoTokenizer.from_pretrained(VOCAB_NAME, max_len=MODEL_MAX_LEN)
# tokenizer = BertTokenizerFast.from_pretrained(VOCAB_NAME, max_len=MODEL_MAX_LEN)

dataset = TextDatasetForNextSentencePrediction(
    tokenizer=tokenizer,
    file_path=NSP_DATESET_PATH,
    block_size=MODEL_MAX_LEN
)
# NSP_DATESET_PATH is a text file processed from wikipedia following the requirement

But it keeps getting the EOFError: Ran out of input :

__init__(self, tokenizer, file_path, block_size, overwrite_cache, short_seq_probability, nsp_probability)
    401                 start = time.time()
    402                 with open(cached_features_file, "rb") as handle:
--> 403                     self.examples = pickle.load(handle)
    404                 logger.info(
    405                     f"Loading features from cached file {cached_features_file} [took %.3f s]", time.time() - start

EOFError: Ran out of input

Topic		Replies	Views
AttributeError for Text Dataset For Next Sentence Prediction: no attribute 'documents' Beginners	0	1015	July 26, 2021
Next sentence prediction on custom model 🤗Transformers	3	3389	May 14, 2024
Pre-train BERT with HF Trainer 🤗Transformers	0	739	April 22, 2022
Fine tuning bert on next sentence prediction task Intermediate	5	4044	September 30, 2020
Identifying max_steps for generativeText Dataset For Next SentencePrediction Intermediate	0	768	November 5, 2021

Why Text Dataset For Next SentencePrediction get “Run out of input” error?

Related topics