How to use transformer attention model when the input is features

seyeeet · October 10, 2020, 5:31pm

I am totally new to this NLP and transformer and attention.
I was playing with models sentence-transformers and want to explore more, but now I got stuck.
I have an input of BxKx768 which is my embedded features.
Is there a way to give them to a transformer(which has attention model ) and get the output of size BxM where M can be any number?

I learned to do it when input is a sentence, but have no idea how to do it when input is features.
I guess what I am asking is how to give my input to a transform model and have my output.
Apologies in advance if it is a bad question.

If that is easy, are there different models to try, like e.g. in resnet we have resnet 18,50,etc, do have the same thing here?

PereLluis13 · October 12, 2020, 1:18pm

The models in the HF library are focused on NLP, hence they have extra stuff related to language, such as an embeddings layer, positional and token types information as well as other model specific features.

If you want to build your own model using Transformer layers then perhaps you should look at https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html.

Regarding the last question, resnet18/50… are pre-trained models with different model sizes, see this post. With Transformers you can also change the size of the network by specifying different number of attention heads or layers. For instance bert-base model has 12-layers in depth and each has 12 attention heads, while bert-large is 24 layers deep and uses 16 attention heads, that’s why the embeddings they produce are different in size, 768 vs 1024, which has to do with the number of attention heads in this case. If you want to understand more I recommend this post. So I think the parallelism would be bert-base being a Resnet-18 and bert-large a Resnet-50.

You could use pretrained Language Modelling models listed here for language related tasks as you would with pre-trained Resnets for computer vision tasks, although there are more task-specific trained models you can explore here.

So as you can see there’s a lot happening, so welcome to the NLP world

Topic		Replies	Views
Image Features as Model Input Beginners	2	929	November 18, 2020
Float tensors as input to transformer Beginners	2	526	March 24, 2024
Using trasnsformer to get image features 🤗Transformers	3	3350	March 20, 2024
Feed output from one transformer model as input to another 🤗Transformers	1	1104	July 30, 2021
Options for feature addition 🤗Transformers	0	1004	November 24, 2022

How to use transformer attention model when the input is features

Related topics