How to use transformer attention model when the input is features

PereLluis13 · October 12, 2020, 1:18pm

The models in the HF library are focused on NLP, hence they have extra stuff related to language, such as an embeddings layer, positional and token types information as well as other model specific features.

If you want to build your own model using Transformer layers then perhaps you should look at https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html.

Regarding the last question, resnet18/50… are pre-trained models with different model sizes, see this post. With Transformers you can also change the size of the network by specifying different number of attention heads or layers. For instance bert-base model has 12-layers in depth and each has 12 attention heads, while bert-large is 24 layers deep and uses 16 attention heads, that’s why the embeddings they produce are different in size, 768 vs 1024, which has to do with the number of attention heads in this case. If you want to understand more I recommend this post. So I think the parallelism would be bert-base being a Resnet-18 and bert-large a Resnet-50.

You could use pretrained Language Modelling models listed here for language related tasks as you would with pre-trained Resnets for computer vision tasks, although there are more task-specific trained models you can explore here.

So as you can see there’s a lot happening, so welcome to the NLP world

Topic		Replies	Views
Image Features as Model Input Beginners	2	929	November 18, 2020
Float tensors as input to transformer Beginners	2	530	March 24, 2024
Using trasnsformer to get image features 🤗Transformers	3	3354	March 20, 2024
Feed output from one transformer model as input to another 🤗Transformers	1	1105	July 30, 2021
Options for feature addition 🤗Transformers	0	1004	November 24, 2022

How to use transformer attention model when the input is features

Related topics