Multiple texts as inputs to Transformers models

I would like to use multiple texts as inputs to a model, let’s say I have a dataset with 10 columns each column is a text (sentence or two), how can I fit all these inputs to the model and do a classification for example ?
I can see it’s possible to just concatenate all texts in one, but seems that for me, I need a very large data to be apple to achieve good accuracy.
Maybe using multiple models (BERT) in parallel, taking last hidden state, concatenate them and classify ? But the problem is that there’s so many values order of 30 texts.

Any idea how to tackle this ?

You should take the same approach as Extractive text summarization :

Concatenate all your sentences, separated with a special token (CLS for example), then use the CLS token representation to do classification.

From the Presumm paper

2 Likes

Hi @astariul, thank you for your reply.
I understand what you suggested, the problem is that I don’t have only texts as inputs I have also some floats values, is converting this values to text would be sufficient ?

I see…

I never encountered this case myself, but maybe you can directly input the float values in the last classifier ?
Since it’s not text, there is no need for BERT to encode it (?)

Hi @Zack

I don’t know whether you’ve tried / considered the multimodal toolkit (blog post, github)- takes in tabular data (text, numbers, categorical data) and can use them as inputs to develop models.

Haven’t tried it myself, but looks quite promising.

5 Likes

Hello @astariul ,

Thank you for your answer. I am also trying to do something similar but I just had a question though. So, even after concatenating the data will the different input data be provided with different weights and biases in this case? or the whole input text itself will be provided with weights and biases?

You can do both.

In the case of BERT for summarization, they just use sentence-specific representation :
[CLS] sen1 [SEP] [CLS] sen2 [SEP] [CLS] sen3 [SEP]
Then use each CLS token as a representation of each sentence.

But if you want a general representation for the whole text, you can just train your model with an additional token at the beginning :
[CLS2] [CLS] sen1 [SEP] [CLS] sen2 [SEP] [CLS] sen3 [SEP]

Then use CLS for sentence representation, CLS2 for token representation.

Hello @astariul , Thank you for your reply again. I was just curious to know if I can build a model using huggingface transformers which can give me multiple outputs?

Yes you can.
Regular models have a single output (usually a LM head or a classification head) on top of the Transformer stack, but you can just add several different heads on top of the Transformer stack to get multiple outputs.