Tensorflow Huggingface Datasets Equivalent to PyTorch

stevhliu · June 16, 2022, 4:10pm

Hi!

You can use the to_tf_dataset() function which just got a nice rework. If your elements are all the same length, then the built-in collator will handle it (otherwise you’ll need a custom collator). You can just do:

tf_train = dataset.to_tf_dataset(columns=["input"], 
                                 label_cols=["labels"],
                                 batch_size=8,
                                 shuffle=True)

Check out the docs here for more details about

Topic		Replies	Views
Use tf.data.Data with HuggingFace datasets 🤗Transformers	2	2641	March 22, 2021
Convert dataset to pytorch dataloader 🤗Datasets	3	7234	April 7, 2023
Collate function for tabular data with some text 🤗Datasets	3	578	February 2, 2023
Convert HF Dataset to tfds Beginners	0	395	April 29, 2021
Tokenizer to dataset to datacollator Beginners	1	1328	April 28, 2022

Tensorflow Huggingface Datasets Equivalent to PyTorch

Related topics