Is it possible to tokenize multiple text modalities?

erezsheffi · October 15, 2020, 5:38pm

Do the tokenizers in the transformers package support tokenization of triplets?
For example:

Lets assume we’re dealing with a VQA dataset. Each entry in the dataset contains the following information:

I would like to able to represent each input as:
[CLS] Q + [SEP] + A + [SEP] + CAPTIONS

UriVendict · September 1, 2022, 8:27am

Did you figure out an answer to this?

Topic		Replies	Views
Combine multiple sentences together during tokenization 🤗Tokenizers	3	5639	February 4, 2022
Multi-input tag and ,multi-label output for token classification using Bert pretrained model 🤗Transformers	1	86	January 9, 2025
Programmatic way to Tokenization on Custom Text Columns 🤗Tokenizers	0	568	June 27, 2022
Custom Tokenizing? 🤗Tokenizers	0	240	March 19, 2024
Passing list of inputs to tokenize 🤗Tokenizers	1	1338	May 9, 2022