InvalidArgumentError with vit-base-patch16-224 model?

mbluetail · June 16, 2022, 7:08pm

Hi. So I am using a pretrained model based on google’s vit-base-patch16-224-in21k for a binary classification of images (human vs non human) .
I am using Keras/tensorflow 2.6.0 API.

here are some parts of my code.

# Downloading the base model
base_model = TFViTModel.from_pretrained('google/vit-base-patch16-224-in21k')

# Flipping and rotating images
data_augmentation = keras.Sequential(
    [layers.RandomFlip("horizontal"), layers.RandomRotation(0.1),]
)
# Freeze base model
base_model.trainable = False
# Create new model
inputs = keras.Input(shape = (3, 224, 224))
x = data_augmentation(inputs)   # apply data augmentation

x = base_model(x, training=False)[0]
outputs = tf.keras.layers.Dense(1, activation='sigmoid')(x)


# model
model_vit = tf.keras.Model(inputs, outputs)
model_vit.compile(loss='binary_crossentropy',optimizer='adam', metrics=['accuracy'])

model_vit.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         [(None, 3, 224, 224)]     0         
_________________________________________________________________
sequential_1 (Sequential)    (None, 3, 224, 224)       0         
_________________________________________________________________
tf_vi_t_model (TFViTModel)   TFBaseModelOutputWithPool 86389248  
_________________________________________________________________
dense_2 (Dense)              (None, 197, 1)            769       
=================================================================
Total params: 86,390,017
Trainable params: 769
Non-trainable params: 86,389,248

There are lots of non-trainable parameters by the way.

When I run the training I get this error:

# Train the Vit model
vit_trained_model = model_vit.fit( X_train_images, y_train_labels, validation_data=(X_val_images, y_val_labels), batch_size = 8, verbose=2, epochs=50)

scores = model_vit.evaluate(test_images, test_labels_binary, verbose=0)
print("ViT Model Accuracy on Test Set: %.2f%%" % (scores[1]*100))


---------------------------------------------------------------------------
InternalError                             Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_28616\3601201585.py in <cell line: 2>()
      1 # Train the Vit model
----> 2 vit_trained_model = model_vit.fit( X_train_images, y_train_labels, validation_data=(X_val_images, y_val_labels), batch_size = 8, verbose=2, epochs=50)
      3 
      4 scores = model_vit.evaluate(test_images, test_labels_binary, verbose=0)
      5 print("Xception Accuracy on Test Set: %.2f%%" % (scores[1]*100))
....

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\framework\constant_op.py in convert_to_eager_tensor(value, ctx, dtype)
    104       dtype = dtypes.as_dtype(dtype).as_datatype_enum
    105   ctx.ensure_initialized()
--> 106   return ops.EagerTensor(value, ctx.device_name, dtype)
    107 
    108 

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

This error most likely means that my computer is out of memory. Fine, another approach is to use generators and define the batch size manually:

from tensorflow.keras.utils import Sequence
import numpy as np   

class DataGenerator(Sequence):
    def __init__(self, x_set, y_set, batch_size):
        self.x, self.y = x_set, y_set
        self.batch_size = batch_size

    def __len__(self):
        return int(np.ceil(len(self.x) / float(self.batch_size)))

    def __getitem__(self, idx):
        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]
        batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]
        return batch_x, batch_y

train_gen = DataGenerator(X_train_images, y_train_labels, 16)
test_gen = DataGenerator(X_val_images, y_val_labels, 16)

history = model_vit.fit(train_gen,
                    epochs=6,
                    validation_data=test_gen)

this gives the error relating to input shapes?

Epoch 1/6
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_13140\3904230856.py in <cell line: 21>()
     19 
     20 
---> 21 history = model_vit.fit(train_gen,
     22                     epochs=6,
     23                     validation_data=test_gen)

C:\ProgramData\Anaconda3\lib\site-packages\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1182                 _r=1):
   1183               callbacks.on_train_batch_begin(step)
-> 1184               tmp_logs = self.train_function(iterator)
   1185               if data_handler.should_sync:
   1186                 context.async_wait()

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self, *args, **kwds)
    883 
    884       with OptionalXlaContext(self._jit_compile):
--> 885         result = self._call(*args, **kwds)
    886 
    887       new_tracing_count = self.experimental_get_tracing_count()

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self, *args, **kwds)
    948         # Lifting succeeded, so variables are initialized and we can run the
    949         # stateless function.
--> 950         return self._stateless_fn(*args, **kwds)
    951     else:
    952       _, _, _, filtered_flat_args = \

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\eager\function.py in __call__(self, *args, **kwargs)
   3037       (graph_function,
   3038        filtered_flat_args) = self._maybe_define_function(args, kwargs)
-> 3039     return graph_function._call_flat(
   3040         filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
   3041 

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
   1961         and executing_eagerly):
   1962       # No tape is watching; skip to running the function.
-> 1963       return self._build_call_outputs(self._inference_function.call(
   1964           ctx, args, cancellation_manager=cancellation_manager))
   1965     forward_backward = self._select_forward_and_backward_functions(

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\eager\function.py in call(self, ctx, args, cancellation_manager)
    589       with _InterpolateFunctionError(self):
    590         if cancellation_manager is None:
--> 591           outputs = execute.execute(
    592               str(self.signature.name),
    593               num_outputs=self._num_outputs,

C:\ProgramData\Anaconda3\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57   try:
     58     ctx.ensure_initialized()
---> 59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
     60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:

InvalidArgumentError:  input depth must be evenly divisible by filter depth: 224 vs 3
	 [[node model/tf_vi_t_model/vit/embeddings/patch_embeddings/projection/Conv2D (defined at ProgramData\Anaconda3\lib\site-packages\transformers\models\vit\modeling_tf_vit.py:199) ]] [Op:__inference_train_function_30507]

Errors may have originated from an input operation.
Input Source operations connected to node model/tf_vi_t_model/vit/embeddings/patch_embeddings/projection/Conv2D:
 model/tf_vi_t_model/vit/embeddings/patch_embeddings/transpose (defined at ProgramData\Anaconda3\lib\site-packages\transformers\models\vit\modeling_tf_vit.py:197)

Function call stack:
train_function

Can anyone explain to me what " input depth must be evenly divisible by filter depth: 224 vs 3" means for me and how to fix?
the shapes of my train and validation data are as follows:

Train: X_train_images=(3932, 224, 224, 3), y_train_labels=(3932, 1)
Validation: X_val_images=(800, 224, 224, 3), y_val_labels=(800, 1)

It is my first time experimenting with a ViT transfer model! thank you very much. Any other advice on my model architecture is welcome too.

P.S. I have used this article as a guide to install hugging face and transformers through anaconda and run a model with my dataset of images, https://www.philschmid.de/image-classification-huggingface-transformers-keras

ellivalla · July 18, 2022, 7:25am

Hi! I encountered a similar problem. Did you manage to find the solution?

Shir2 · November 17, 2022, 8:39am

Hey @mbluetail @ellivalla ,
did you find the sulution?

nielsr · November 17, 2022, 12:49pm

cc @joaogante @amyeroberts @sayakpaul

sayakpaul · November 17, 2022, 1:03pm

Could you amend your code following this one?

github.com

sayakpaul/TF-2.0-Hacks/blob/master/data2vec_vision_image_classification.ipynb

{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "data2vec-vision-image-classification.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "authorship_tag": "ABX9TyPEmUxVC6HNz8s8TjfpN6cJ",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "accelerator": "GPU"
  },

This file has been truncated. show original

Shir2 · November 20, 2022, 10:06am

@sayakpaul Hey, yes I can do it.
I noticed 2 differences:

You used TFData2VecVisionModel Instead of: TFAutoModelForImageClassification
You added these lines to code:

# Transpose because the `transformers` model has a leading channel dimension.
  dataset = dataset.map(lambda x, y: (tf.transpose(x, [0, 3, 1, 2]), y), AUTO)
  return dataset.prefetch(AUTO)

I accept the first change.
The second change - I faced some difficulty to aplly for my own dataset.
I have dataset that I loaded with that code:

data_dir = "/home/data/train"
datagen_kwargs = dict(dtype='float32', validation_split=.20)
dataflow_kwargs = dict(target_size=(IMAGE_SIZE,IMAGE_SIZE), batch_size=BATCH_SIZE,
                   interpolation="bicubic")
valid_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
    **datagen_kwargs)
valid_generator = valid_datagen.flow_from_directory(
    data_dir, subset="validation", shuffle=True, **dataflow_kwargs)
do_data_augmentation = True
if do_data_augmentation:
    percent = 0.1
    train_datagen = tf.keras.preprocessing.image.ImageDataGenerator(
      rotation_range=30,
      horizontal_flip=True,
      width_shift_range=percent, height_shift_range=percent,
      shear_range=percent, zoom_range=percent,
      **datagen_kwargs)
else:
  train_datagen = valid_datagen
train_generator = train_datagen.flow_from_directory(
    data_dir, subset="training", shuffle=True, **dataflow_kwargs)

when I pass train_generator and valid_generator to model.fit() I get that error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_2432028/1086025199.py in <module>
----> 1 history = model.fit(train_generator, validation_data=valid_generator, epochs=num_train_epochs)

~/.local/lib/python3.8/site-packages/keras/utils/traceback_utils.py in error_handler(*args, **kwargs)
     65     except Exception as e:  # pylint: disable=broad-except
     66       filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67       raise e.with_traceback(filtered_tb) from None
     68     finally:
     69       del filtered_tb

/usr/local/lib/python3.8/dist-packages/keras_preprocessing/image/iterator.py in __getitem__(self, idx)
     63         index_array = self.index_array[self.batch_size * idx:
     64                                        self.batch_size * (idx + 1)]
---> 65         return self._get_batches_of_transformed_samples(index_array)
     66 
     67     def __len__(self):

/usr/local/lib/python3.8/dist-packages/keras_preprocessing/image/iterator.py in _get_batches_of_transformed_samples(self, index_array)
    220             A batch of transformed samples.
    221         """
--> 222         batch_x = np.zeros((len(index_array),) + self.image_shape, dtype=self.dtype)
    223         # build batch of image data
    224         # self.filepaths is dynamic, is better to call it once outside the loop

TypeError: 'list' object cannot be interpreted as an integer

What can I do?

sayakpaul · November 20, 2022, 2:39pm

I would suggest turning your ImageDataGenerator into a tf.data pipeline.

If you have numpy arrays then you can do something like the following:

def pp(x, y):
    x = tf.transpose(x, [0, 3, 1, 2])
    return {"pixel_values": x, "labels": y}


tf_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
tf_dataset = tf_dataset.batch(BATCH_SIZE)
tf_dataset = tf_dataset.map(pp)

The above assumes that x_train is a NumPy array consisting of the images and y_train is a NumPy array consisting of the labels.

raygx · April 20, 2024, 10:58am

@sayakpaul
I checked your example and the code in the github link.
One thing that doesn’t address is, when augmenting, we don’t want to stick to one set of augmented data. The basic idea is the images are transformed in every epoch. If not, then the transformation as per your example is static and is similar to not applying any transformations.

I am banging my head around for weeks now, trying to figure a way out. If only there is way to use huggingface model as as layer in a new model as you would normally do in case of keras (tensorflow) models.

If you have any insights in that directions please help me out.
Thanks.

Topic		Replies	Views
Can't Load ViT Model for Fine Tuning 🤗Transformers	2	1541	August 11, 2022
Changing resolution of transformer models for training 🤗Transformers	0	661	September 2, 2022
Help! - Drastic Overfitting and Atrocious Accuracy on ViT Model 🤗Transformers	0	710	July 23, 2022
ViT problem with GPU usage require image to be numpy 🤗Transformers	3	668	June 24, 2022
TFViT model keeps throwing error while training it using TFTrainer 🤗Transformers	0	332	June 24, 2023

InvalidArgumentError with vit-base-patch16-224 model?

Related topics