Multimodal Transformers with signal inputs

ashkan-zadeh · May 9, 2024, 7:09am

Hi all,

I’ve read papers that used multimodal models with both text and video/image together as inputs for VLMs but I’m not sure whether we can also have signals data (e.g., a sensor data) as an input along with text and image/video (as the third input) or not! I would like to hear your idea.

Topic		Replies	Views
Multimodal transformer Models	0	1071	April 23, 2023
Model with Multiple inputs to yield Multiple Outputs 🤗Transformers	0	506	July 25, 2023
Fine-tunening a multimodal model Beginners	4	4883	December 25, 2024
What Open-Source Multi-Modal Language Models Can Take Multiple Videos as Input? Beginners	0	39	January 22, 2025
Multimodal architectures with HuggingFace transformers for speech and text 🤗Transformers	3	1132	November 14, 2022

Multimodal Transformers with signal inputs

Related topics