I am trying to follow the tutorial here, but I want to use my own dataset.
I have stored the texts and the labels in a pandas dataset named train
. Which has only two columns text
, and labels
.
I have tried the following code
train_tokenized = tokenizer(list(train.text)), padding="max_length", truncation=True, return_tensors="tf")
train_features = {x: train_tokenized[x] for x in tokenizer.model_input_names}
train_tf_data = tf.data.Dataset.from_tensor_slices((train_features, train.labels))
train_tf_data = train_tf_data.batch(8)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=tf.metrics.SparseCategoricalAccuracy(),
)
model.fit(train_tf_data, epochs=3)
But it gives me ValueError: Unsupported type BatchEncoding returned by IteratorSpec._serialize