Transformers, am i only using a Encoder for Binary Classification?

Hi Guys, i’ve got a basic, begginers Questions. If im doing something like a Binary Classification (Sentiment Analysis) of Text and im using Transformers (like Bert for example). Am i using both Encoder, and the Decoder Part of the Transformer Network? If i understood the basics right im just using an Encoder for Sequences to Binary - and Sequences to Sequence would use both Decoder and Encoder?

Ty in advanced.

Hi @unknownTransformer,

BERT uses only the Encoder.

See this page of the docs: https://huggingface.co/transformers/model_summary.html

See also the Devlin paper: https://arxiv.org/abs/1810.04805

Also try Jay Alammar’s blogs, for example this one: alammar.github.io/illustrated-bert/

When you do sentiment analysis, you are using the basic BERT to code your text as numbers, and then you tune a final layer to your task. The Huggingface models such as BertForSequenceClassification already include a “final” layer for you, which is randomly initialized. See this page: https://huggingface.co/transformers/model_doc/bert.html

You will probably want to Freeze most of the BERT layers while you fine-tune the last layer, at least initially. See this post and the reply by sgugger: How to freeze some layers of BertModel

Good luck with it all.

1 Like