Hey @rajkumar I believe the to_tf_dataset() method was only added in a recent version of datasets. Could you try upgrading to the latest version and check if the problem persists?
Hello am getting the same error, though using the newer version of transformers and datasets:
UnexpectedStatusException: Error for Training job sample-huggingface-training-2022-02-25-22-41-34: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
ExitCode 1
ErrorMessage "AttributeError: ‘Dataset’ object has no attribute ‘to_tf_dataset’
"
These are the versions am using:
sagemaker: 2.77.0
transformers: 4.11.0
tensorflow: 2.7.1
dataset version: 1.18.3
Wow! Looks like many are having similar problems because TensorFlow seems to have difficulty working or integrating with Pandas dataframes to convert to tf.data.Dataset format using to_tf_dataset(), in general. Therefore I followed this solution from Stack Overflow and it worked! I just had to repeat the same steps that I had been doing with Pandas dataframes and tokenize. TensorFlow needs to get this solved instead of having to use:
from datasets import Dataset
tf_dataset = Dataset.from_pandas(dataframe)