OOM GPU when extracting features into dict according to fine-tuning documentation

I have seen other posts about GPU OOM during training, but I am having a problem earlier than that.

I am trying to fine tune ‘bert-base-cased’ using TF model types and the ‘amazon_polarity’ data set. For the input I am combining the title and content with a whitespace between them and feeding the result to the tokenizer.

however, when I get to the steps where the following sequence is performed:

dataset = hf_ds.remove_columns(rm_txt_fields).with_format(“tensorflow”)
features = {x: dataset[x] for x in tokenizer.model_input_names}
tfdataset = tf.data.Dataset.from_tensor_slices((features, dataset[“labels”])).batch(32)

it fails when creating the features dictionary object, referencing that too much data is being allocated on the GPU. I wouldn’t have expected it to be allocating anything on the GPU at this point, at least not until things are being fed into the model.

maybe I shouldn’t be setting the dataset type to tensorflow this early? This is what it showed in the docs for fine tuning. I am hoping to use all of the dataset rather than just the tiny slice they used during the explanation. How should I go about doing that? I presume it needs to be batched somehow,.