Cardinality issue when training bert from scratch (tensorflow)

Hello there!

I am trying to adapt the official Google Colab for language generation to tensorflow and everything seems to work wonderfully by simply appending TF to most of the huggingface function calls (TFAutoModel, etc)

Unfortunately, this strategy fails at the training step:

from transformers import TFTrainer, TFTrainingArguments
import tensorflow as tf

training_args = TFTrainingArguments(
    "test-clm",
    evaluation_strategy = "epoch",
    learning_rate=2e-5)

trainer = TFTrainer(
    model=model,
    args = training_args,
    train_dataset=lm_datasets[0:1000],
    eval_dataset=lm_datasets[1000:])

trainer.train()

 self.num_train_examples = self.train_dataset.cardinality().numpy()
AttributeError: 'dict' object has no attribute 'cardinality'

I have absolutely no idea what this cardinality is. Do you know what the issue can be?
Thanks!

I saw a related issue here TensorFlow Question-Answering example fails to run (cardinality error) · Issue #10246 · huggingface/transformers · GitHub … huggingface masters, do you have an idea?

Thanks!

@BramVanroy sorry to pull you in, but I wonder if you have any clues about what is happening here? is this a tensorflow specific issue?

Seems like a potential bug. I’d recommend you to create a bug report on Github. But please provide the whole error trace and not just the last line, thanks.