Hi Guys, i’ve got a basic, begginers Questions. If im doing something like a Binary Classification (Sentiment Analysis) of Text and im using Transformers (like Bert for example). Am i using both Encoder, and the Decoder Part of the Transformer Network? If i understood the basics right im just using an Encoder for Sequences to Binary - and Sequences to Sequence would use both Decoder and Encoder?
Ty in advanced.
Hi @unknownTransformer,
BERT uses only the Encoder.
See this page of the docs: https://huggingface.co/transformers/model_summary.html
See also the Devlin paper: https://arxiv.org/abs/1810.04805
Also try Jay Alammar’s blogs, for example this one: alammar.github.io/illustrated-bert/
When you do sentiment analysis, you are using the basic BERT to code your text as numbers, and then you tune a final layer to your task. The Huggingface models such as BertForSequenceClassification already include a “final” layer for you, which is randomly initialized. See this page: https://huggingface.co/transformers/model_doc/bert.html
You will probably want to Freeze most of the BERT layers while you fine-tune the last layer, at least initially. See this post and the reply by sgugger: How to freeze some layers of BertModel
Good luck with it all.
1 Like