Run detectron2 for feature extraction in SageMaker notebook

Petrus · March 10, 2022, 10:49am

Dear HuggingFace/SageMaker pros,

I am trying to produce visual embeddings to fine-tune a VisualBERT model: VisualBERT

There is one example “Generate Embeddings for VisualBERT” that uses detectron2 to do this: VisualBERT

However, this notebook is for Colab. I can run inside Colab, but when I download the ipynb and try to run it in SageMaker I cannot download detectron2…

I try to run:
python -m pip install ‘git+https://github.com/facebookresearch/detectron2.git’

As well as:
git clone GitHub - facebookresearch/detectron2: Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
python -m pip install -e detectron2

But both fails to install detectron2…

Would anyone have an idea on how I can run the “Generate Embeddings for VisualBERT” example inside SageMaker? (Either how to install detectron2, or a way to bypass this.)

Or would you know some other way to successfully extract the visual embedding in a way that VisualBERT would accept?

Many thanks for your help! Best, Petrus

philschmid · March 10, 2022, 1:46pm

@Petrus you can take a look at Sagemaker Serverless Inference for LayoutLMv2 model - #15 by mansimov
to see how to install detectron2 properly.

Are you trying to fine-tune VisualBERT in a SageMaker Notebook or using the HuggingFace.fit() estimator=

Petrus · March 11, 2022, 9:39am

@philschmid Thanks a lot for your reply!

That thread was very informative! I am now able to successfully download detectron2 inside my SageMaker notbook.

What eventually worked for me to successfully download detectron2 inside SageMaker was the following:

Use a regular ‘python3’ version notebook.
Run pip3 install torch torchvision
Follow the detectron2 installation shown in this example: Google Colab

I ran the following:

!pip install pyyaml==5.1

import torch
TORCH_VERSION = ".".join(torch.__version__.split(".")[:2])
CUDA_VERSION = torch.__version__.split("+")[-1]
print("torch: ", TORCH_VERSION, "; cuda: ", CUDA_VERSION)

!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/$CUDA_VERSION/torch$TORCH_VERSION/index.html

After that I could successfully import detectron2

Petrus · March 11, 2022, 9:42am

@philschmid To answer your second point on whether I want to just fine-tune inside a SM notebook, or specifically use an estimator object for training, I would say that either is fine - I want to try both and go with the one that quickest helps me to have a model that I can experiment with. Given that, any particular advice you would give me on my VisualBERT model building journey?

philschmid · March 14, 2022, 8:34am

The easiest is to experiment in the Notebook and then move to a managed Training Job when training e2e on the full dataset.

Petrus · March 16, 2022, 8:28am

@philschmid thank you, I took your advice and made plain experiments in the Notebook.

However, I am experiencing some troubles on putting together my multimodal text and image input such that is accepted by the model. I am now just making a small-scale test on ten data points to properly prepare the inputs.

This is what I have done:

Generated tokens of my text: input_ids, token_type_ids, attention_mask
I used the “bert.base-cased” tokenizer for this.
Generated visual embeddings of my images: visual_embeds, visual_token_type_ids, visual_attention_mask
I followed this example and updated it with my data to generate the visual embeddings (using the detectron2 library): Google Colab
I put all these six tensors and the labels (also a tensor) into a dictionary and transformed it into a Dataset. Further, I split it into a train and test Dataset.
Here is what my training data set looks like:

Dataset({
    features: ['labels', 'input_ids', 'token_type_ids', 'attention_mask', 'visual_embeds', 'visual_token_type_ids', 'visual_attention_mask'],
    num_rows: 8
})

I then perform training as per the below:

from transformers import BertTokenizer, VisualBertForMultipleChoice
from transformers import TrainingArguments
from transformers import Trainer
import torch

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = VisualBertForMultipleChoice.from_pretrained("uclanlp/visualbert-vcr")

training_args = TrainingArguments(
    output_dir="output_dir/",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train,
    eval_dataset=test,
)

trainer.train()

The issue is that when I run trainer.train() I receive the following error message:

/home/ec2-user/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
  FutureWarning,
***** Running training *****
  Num examples = 8
  Num Epochs = 3
  Instantaneous batch size per device = 8
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 1
  Total optimization steps = 3
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-25-3435b262f1ae> in <module>
----> 1 trainer.train()

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1398                         tr_loss_step = self.training_step(model, inputs)
   1399                 else:
-> 1400                     tr_loss_step = self.training_step(model, inputs)
   1401 
   1402                 if (

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/trainer.py in training_step(self, model, inputs)
   1982 
   1983         with self.autocast_smart_context_manager():
-> 1984             loss = self.compute_loss(model, inputs)
   1985 
   1986         if self.args.n_gpu > 1:

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
   2014         else:
   2015             labels = None
-> 2016         outputs = model(**inputs)
   2017         # Save past state if it exists
   2018         # TODO: this needs to be fixed and made cleaner later.

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/models/visual_bert/modeling_visual_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, visual_embeds, visual_attention_mask, visual_token_type_ids, image_text_alignment, output_attentions, output_hidden_states, return_dict, labels)
   1143             output_attentions=output_attentions,
   1144             output_hidden_states=output_hidden_states,
-> 1145             return_dict=return_dict,
   1146         )
   1147 

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/models/visual_bert/modeling_visual_bert.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, visual_embeds, visual_attention_mask, visual_token_type_ids, image_text_alignment, output_attentions, output_hidden_states, return_dict)
    821             visual_embeds=visual_embeds,
    822             visual_token_type_ids=visual_token_type_ids,
--> 823             image_text_alignment=image_text_alignment,
    824         )
    825 

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/transformers/models/visual_bert/modeling_visual_bert.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, visual_embeds, visual_token_type_ids, image_text_alignment)
    140                 )
    141 
--> 142             visual_embeds = self.visual_projection(visual_embeds)
    143             visual_token_type_embeddings = self.visual_token_type_embeddings(visual_token_type_ids)
    144 

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     91 
     92     def forward(self, input: Tensor) -> Tensor:
---> 93         return F.linear(input, self.weight, self.bias)
     94 
     95     def extra_repr(self) -> str:

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1690         ret = torch.addmm(bias, input, weight.t())
   1691     else:
-> 1692         output = input.matmul(weight.t())
   1693         if bias is not None:
   1694             output += bias

RuntimeError: mat1 dim 1 must match mat2 dim 0

The issue seems to origin from image_text_allignment and visual_embeds.

Any ideas on ways to solve this error message?
Does the overall steps sound like a good approach to prepare the data? Or would you try something else?

Thanks!

philschmid · March 16, 2022, 9:00am

@Petrus would you mind opening a new thread in Beginners - Hugging Face Forums. This topic is not really SageMaker specific and might be pretty interesting for other community members.

philschmid · March 16, 2022, 9:02am

Also, i am not sure if you have done this but we provide nice documentation for all models including VisualBertForMultipleChoice with example on how to do a forward pass.

Petrus · March 16, 2022, 10:20am

@philschmid I am happy to make a post in that thread.

But just to respond on your second point: Yes, I have seen that example and tried to follow it. But am I right when I say that their way of concatenating the inputs as a dictionary only works for making predictions? While, for training, you need to use a pytorch Dataset, right?

Thank you Philipp!

Topic		Replies	Views
Sagemaker Serverless Inference for LayoutLMv2 model Amazon SageMaker	17	4378	June 15, 2022
Error deploying BERT on SageMaker Amazon SageMaker	20	5288	April 1, 2025
Finetuning sentence embedding model with SageMaker - how to compute loss? Amazon SageMaker	9	3959	December 21, 2022
How to train LayoutLMv2 on the Sequence Classification task in AWS Sagemaker? Amazon SageMaker	4	1620	August 4, 2022
Huggingface_hub integration: ModuleNotFoundError: No module named 'huggingface_hub' Amazon SageMaker	6	11402	December 6, 2021

Run detectron2 for feature extraction in SageMaker notebook

Related topics