Multimodal transformer

DanRKan · April 23, 2023, 11:31am

Hi! I’m currently having social media data with 4 modalities - image, text(sentences and context-free text like hashtags), categories, and time-series based(posting data per post and also username who posted it). I explored the Huggingface models for multimodal transformers and found that all the models used only 2 modalities (text-tex or image-text or speech-text or graph transformers).

Can I use an image-text multimodal transformer and fine-tune it for my dataset with 4 modalities where i/p=post’s information grouped by the user? Any tips on whether it’ll be good enough and how to do it?

Topic		Replies	Views
Multimodal architectures with HuggingFace transformers for speech and text 🤗Transformers	3	1132	November 14, 2022
Multi-input classification (images + Texts) Beginners	6	1140	February 18, 2024
Fine-tunening a multimodal model Beginners	4	4883	December 25, 2024
Query on Hugging Face's Transformer Library \| Julio Herrera Beginners	0	14	July 23, 2024
Multimodal Transformers with signal inputs Beginners	0	90	May 9, 2024

Multimodal transformer

Related topics