Use tf.data.Data with HuggingFace datasets

jominmathew · March 22, 2021, 12:37pm

I was going through this tutorial Using a Dataset with PyTorch/Tensorflow — datasets 1.5.0 documentation .
The example s for PyTorch.
Do we have the same for Tensorflow?

eddie96 · March 22, 2021, 1:45pm

Well there’s a section for tensorflow, on the top right corner there’s a split for tensorflow or pytorch, default is in pytorch

This is was took from the official documentation, this is for tensorflow btw

>>> import tensorflow as tf
>>> from datasets import load_dataset
>>> from transformers import AutoTokenizer
>>> dataset = load_dataset('glue', 'mrpc', split='train')
>>> tokenizer = AutoTokenizer.from_pretrained('bert-base-cased')
>>> dataset = dataset.map(lambda e: tokenizer(e['sentence1'], truncation=True, padding='max_length'), batched=True)
>>>
>>> dataset.set_format(type='tensorflow', columns=['input_ids', 'token_type_ids', 'attention_mask', 'label'])
>>> features = {x: dataset[x].to_tensor(default_value=0, shape=[None, tokenizer.model_max_length]) for x in ['input_ids', 'token_type_ids', 'attention_mask']}
>>> tfdataset = tf.data.Dataset.from_tensor_slices((features, dataset["label"])).batch(32)
>>> next(iter(tfdataset))
({'input_ids': <tf.Tensor: shape=(32, 512), dtype=int32, numpy=
array([[  101,  7277,  2180, ...,

jominmathew · March 22, 2021, 2:08pm

Thanks alot

Topic		Replies	Views
Tensorflow Huggingface Datasets Equivalent to PyTorch 🤗Datasets	2	1045	June 27, 2022
Convert HF Dataset to tfds Beginners	0	395	April 29, 2021
Quick Tour: "Train using Tensorflow" gives `Dataset argument should be a datasets.Dataset` error Beginners	4	1073	May 29, 2023
Transform a tf.data.dataset to a datasets.dataset? Beginners	3	2381	September 30, 2022
The datasets.map() method doesn't keep tensor format from `tokenizer` 🤗Datasets	1	1920	November 4, 2022

Use tf.data.Data with HuggingFace datasets

Related topics