Dataset object has no attribute `to_tf_dataset`

I am following HuggingFace Course. I am at Fine-tuning a model.
Link: Fine-tuning a pretrained model - Hugging Face Course

I use tokenize_function and map as mentioned in the course to process data.

# define a tokenize function
def Tokenize_function(example):
return tokenizer(example['sentence'], truncation=True)

# tokenize entire data
tokenized_data = raw_data.map(Tokenize_function, batched=True)

I get Dataset object at this point. When I try converting this to a TF dataset object as mentioned in the course, it throws the following error.

# convert to TF dataset
train_data = tokenized_data["train"].to_tf_dataset(
columns = ['attention_mask', 'input_ids', 'token_type_ids'],
label_cols = ['label'],
shuffle = True,
collate_fn = data_collator,
batch_size = 8
)

Output:

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/tmp/ipykernel_42/103099799.py in <module>
1 # convert to TF dataset
----> 2 train_data = tokenized_data["train"].to_tf_dataset( \
3 columns = ['attention_mask', 'input_ids', 'token_type_ids'], \
4 label_cols = ['label'], \
5 shuffle = True, \
AttributeError: 'Dataset' object has no attribute 'to_tf_dataset'

When I look for dir(tokenized_data["train"]), there is no method or attribute in the name of to_tf_dataset.

Why do I get this error? And how to clear this?

Please help me.

Hey @rajkumar I believe the to_tf_dataset() method was only added in a recent version of datasets. Could you try upgrading to the latest version and check if the problem persists?

2 Likes

Hi @lewtun. You are absolutely correct. I upgraded transformers and datasets to the latest versions and the issues are resolved.

# upgrade transformers and datasets to latest versions
!pip install --upgrade transformers
!pip install --upgrade datasets

Thanks a lot for your timely reply.

1 Like