How to ensure tensor size will be dimensions (batch_size, 512)?

bmckenna · September 4, 2022, 4:49pm

TL;DR I’m hitting an intermittent error about my tensor dimensions when running predictions on my fine-tuned model.

Full story:
I followed this tutorial to fine-tune distilbert to a text classification dataset. I encounter the following error intermittently when running predictions against my fine-tuned model. Here is the predict code:

dataset = Dataset.from_pandas(X)
self.model = TFAutoModelForSequenceClassification.from_pretrained('my_model')

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")

def new_preprocess_function(examples):
      return tokenizer(examples["text"], truncation=True)

tokenized_dict = dataset.map(preprocess_function, batched=True)

data_collator = DataCollatorWithPadding(tokenizer=tokenizer, return_tensors="tf")

tf_predict_set = tokenized_dict.to_tf_dataset(
                columns=["attention_mask", "input_ids", "label"],
                shuffle=True,
                batch_size=16,
                collate_fn=data_collator,
        )

self.model.predict(tf_predict_set).logits

And the error is:

Graph execution error:

Shape of tensor args_0 [16,449] is not compatible with expected shape [?,512].
	 [[{{node EnsureShape_1}}]]
	 [[IteratorGetNext]] [Op:__inference_predict_function_52036]

Why does tensorflow expect 512? Aren’t tensor dimensions (batch_size, max_len_of_sentence_in_the_batch)? I know BERT cannot accept input sentences longer than 512 tokens, but this is fewer than 512.

How can I ensure my tensor dimensions are (batch_size, 512)?

Topic		Replies	Views
Tokenizer truncation Beginners	1	1759	June 14, 2022
Tensor size error in PEFT(Prefix Tuning) 🤗Transformers	5	1525	August 7, 2023
Dimension error when trying to use Neuron compiled HF model on inferentia Amazon SageMaker	4	1234	May 20, 2022
RuntimeError: The expanded size of the tensor (31) must match the existing size (7) at non-singleton dimension 0. Target sizes: [31]. Tensor sizes: [7] Beginners	0	183	May 23, 2024
Token classification on custom BERT and data Intermediate	2	1499	December 28, 2020

How to ensure tensor size will be dimensions (batch_size, 512)?

Related topics