Chapter 1 questions

bjoernvv · March 3, 2025, 2:29pm

After learning about the different types of Transformer architectures (Thanks a lot for this amazingly informative course!), I have a question about the output of encoder-only models:

The video says that (at least for BERT-like) models, the output vector contains encodings for each of the input words.

It is also said, that these outputs are well suited for sentence classification tasks.

I see how an attention-based classification would be able to use the encoder output as input, attending to the most relevant words for the classification task.

But this way of encoding doesn’t seem to be suited as input for linear classifiers, right? Because linear classifiers would be sensitive to the position of individual words in the output vector. Or am I overlooking something?

Topic		Replies	Views
Bert2bert translator? 🤗Transformers	6	62	August 28, 2025
Chapter 7 questions Course	121	10697	October 22, 2025
EncoderDecoderModel for token classification 🤗Transformers	0	206	October 29, 2022
Chapter 3 questions Course	154	10946	December 7, 2025
GPT-GPT encoder decoder 🤗Transformers	0	301	May 4, 2021

Chapter 1 questions

Related topics