LXMERT pre-trained model

LetiP · September 19, 2020, 8:48am

Hello, congrats to all contributors for the awesome work with LXMERT! It is exciting to see multimodal transformers coming to hugginface/transformers. Of course, I immediately tried it out and to played with the demo.

Question:
Does the line lxmert_base = LxmertForPreTraining.from_pretrained("unc-nlp/lxmert-base-uncased") load an already pre-trained LXMERT model on the tasks enumerated in the original paper “(1) masked crossmodality language modeling, (2) masked object prediction via RoI-feature regression, (3) masked object prediction via detected-label classification, (4) cross-modality matching, and (5) image question answering.” (Tan & Bansal, 2019)?

sgugger · September 20, 2020, 1:30pm

Tagging our LXMERT specialist @lysandre

lysandre · September 22, 2020, 9:03am

This question has been answered on Github here by @eltoto1219, the author of the huggingface implementation of LXMERT.

LetiP · September 22, 2020, 12:57pm

Hello @lysandre, thanks for tagging the right person. Here is my github response with a new question:

Hello @eltoto1219, thank you for the answer! I suppose it was a weird question from my part, since I was asking this to make sure that I am loading a pre-trained LXMERT model and not some random weights. Especially because I look at the output_lxmert['cross_relationship_score'] of COCO images and captions (so not on some out of distribution images and captions), after I loaded LXMERT with the aforementioned code lxmert_base = LxmertForPreTraining.from_pretrained("unc-nlp/lxmert-base-uncased") . It seems that on cross-modality matching LXMERT performs with 50% accuracy (random guessing). So I wanted to make sure that I load pre-trained weights on (4) cross-modality matching in the first place.

New question : Do you know how it can be that LXMERT randomly guesses on cross-modality matching even it was pre-trained to deliver a score (after the Softmax, of course) of smaller 0.5 if the caption is not describing the image and a score bigger 0.5 it the caption and the image match?

yezhengli9 · December 26, 2020, 1:58am

Any visual question answering demo? Thanks.

valhalla · December 29, 2020, 6:54am

Yes! Here it is

prithivida · December 25, 2021, 4:55pm

Hi Suraj - I am looking for a good starting point to fine tune LXMERT for VQA task on a custom dataset. Could you please point me to something ? @valhalla

Topic		Replies	Views
How do we use LXMERT for inference? 🤗Transformers	2	511	December 25, 2021
LayoutXLM load pretrain with 2048 max_position_embeddings Models	0	275	February 8, 2023
Correct way to get pooled output of LXMertForPretraing Models	0	324	August 10, 2022
Is there a possibility to use MLM modelling for pretraining for autocasualLM model like MPT or falcon? If yes, Has someone tried it? Are there any relevant code bases which I can use? 🤗Transformers	0	214	August 16, 2023
Fine-tunening a multimodal model Beginners	4	4997	December 25, 2024

LXMERT pre-trained model

Related topics