Question NLP course from Huggingface

DannyAI · August 12, 2023, 6:53am

So I am following the Huggingface course for Question answering and I am getting this error

AttributeError : ‘TFBertForQuestionAnswering’ object has no attribute ‘prepare_tf_dataset’

from the code given in the course

tf_train_dataset = model.prepare_tf_dataset(
train_dataset,
collate_fn=data_collator,
shuffle=True,
batch_size=16,
)
tf_eval_dataset = model.prepare_tf_dataset(
validation_dataset,
collate_fn=data_collator,
shuffle=False,
batch_size=16,
)

Please what can be done to solve this issue?
package versions:
transformers → 4.18.0
datasets- → 2.14.4
tensorflow -->2.10.0
Here is the course link: Question answering - Hugging Face NLP Course

Apoorva7 · August 21, 2023, 7:04am

I’m also getting similar error for Machine translation task using MBART.

AttributeError: 'MBartForConditionalGeneration' object has no attribute 'prepare_tf_dataset'

code:

train_dataset = model.prepare_tf_dataset(
    tokenized_datasets["train"],
    batch_size=batch_size,
    shuffle=True,
    collate_fn=data_collator,
)

Kindly share any solution.

DannyAI · August 21, 2023, 7:47am

Here’s what I did for this similar error, more of a hack around it.

tokenized_datasets = raw_datasets.map(
prepare_train_features,
batched=True,
batch_size=10,
remove_columns=raw_datasets[“train”].column_names,
# num_proc=3,
)

Having tokenized my dataset, then I used the tensorflow to convert the data to tensorflow format.

Convert numpy format dataset to TensorFlow format

train_set = tf.data.Dataset.from_tensor_slices((
{
“input_ids”: tokenized_datasets[“train”][“input_ids”],
“attention_mask”: tokenized_datasets[“train”][“attention_mask”]
},
{
“start_positions”: tokenized_datasets[“train”][“start_positions”],
“end_positions”: tokenized_datasets[“train”][“end_positions”]
}
))

validation_set = tf.data.Dataset.from_tensor_slices((
{
“input_ids”: tokenized_datasets[“validation”][“input_ids”],
“attention_mask”: tokenized_datasets[“validation”][“attention_mask”]
},
{
“start_positions”: tokenized_datasets[“validation”][“start_positions”],
“end_positions”: tokenized_datasets[“validation”][“end_positions”]
}
))

print(train_set) # checking the shape of the data
print(validation_set)

Define the batch size

batch_size = 5 # Adjust this value as needed
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)

Optionally uncomment the next line for float16 training

tf.keras.mixed_precision.set_global_policy(“mixed_float16”)
model.compile(optimizer=optimizer)
train_set = train_set.batch(batch_size)
validation_set = validation_set.batch(batch_size)
model.fit(train_set, validation_data=validation_set, epochs=1)

Hope this helps.

Apoorva7 · August 21, 2023, 8:16am

Thanks for sharing, I’ll try it.

Topic		Replies	Views
Dataset object has no attribute `to_tf_dataset` Course	6	9433	July 8, 2023
The question-answering example in the doc throws an AttributeError exception. Please help Beginners	6	5037	April 3, 2024
Error occuring during usig .to_tf_dataset() 🤗Transformers	6	912	January 3, 2024
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds 🤗Transformers	3	1760	November 14, 2023
Quick Tour: "Train using Tensorflow" gives `Dataset argument should be a datasets.Dataset` error Beginners	4	1073	May 29, 2023

Question NLP course from Huggingface

Related topics