So I am following the Huggingface course for Question answering and I am getting this error
AttributeError : ‘TFBertForQuestionAnswering’ object has no attribute ‘prepare_tf_dataset’
from the code given in the course
tf_train_dataset = model.prepare_tf_dataset(
train_dataset,
collate_fn=data_collator,
shuffle=True,
batch_size=16,
)
tf_eval_dataset = model.prepare_tf_dataset(
validation_dataset,
collate_fn=data_collator,
shuffle=False,
batch_size=16,
)
Please what can be done to solve this issue?
package versions:
transformers → 4.18.0
datasets- → 2.14.4
tensorflow -->2.10.0
Here is the course link: Question answering - Hugging Face NLP Course
I’m also getting similar error for Machine translation task using MBART.
AttributeError: 'MBartForConditionalGeneration' object has no attribute 'prepare_tf_dataset'
code:
train_dataset = model.prepare_tf_dataset(
tokenized_datasets["train"],
batch_size=batch_size,
shuffle=True,
collate_fn=data_collator,
)
Kindly share any solution.
Here’s what I did for this similar error, more of a hack around it.
tokenized_datasets = raw_datasets.map(
prepare_train_features,
batched=True,
batch_size=10,
remove_columns=raw_datasets[“train”].column_names,
# num_proc=3,
)
Having tokenized my dataset, then I used the tensorflow to convert the data to tensorflow format.
Convert numpy format dataset to TensorFlow format
train_set = tf.data.Dataset.from_tensor_slices((
{
“input_ids”: tokenized_datasets[“train”][“input_ids”],
“attention_mask”: tokenized_datasets[“train”][“attention_mask”]
},
{
“start_positions”: tokenized_datasets[“train”][“start_positions”],
“end_positions”: tokenized_datasets[“train”][“end_positions”]
}
))
validation_set = tf.data.Dataset.from_tensor_slices((
{
“input_ids”: tokenized_datasets[“validation”][“input_ids”],
“attention_mask”: tokenized_datasets[“validation”][“attention_mask”]
},
{
“start_positions”: tokenized_datasets[“validation”][“start_positions”],
“end_positions”: tokenized_datasets[“validation”][“end_positions”]
}
))
print(train_set) # checking the shape of the data
print(validation_set)
Define the batch size
batch_size = 5 # Adjust this value as needed
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
Optionally uncomment the next line for float16 training
tf.keras.mixed_precision.set_global_policy(“mixed_float16”)
model.compile(optimizer=optimizer)
train_set = train_set.batch(batch_size)
validation_set = validation_set.batch(batch_size)
model.fit(train_set, validation_data=validation_set, epochs=1)
Hope this helps.
1 Like
Thanks for sharing, I’ll try it.