Chapter 1 questions

Beyond the encoder-decoder architecture shared in Sequence-to-sequence models:

Is this how the dot product cross attention mechanism works?

During ‘dot product cross attention’,

  • a decoder layer performs a dot product of each encoder layer’s hidden state and the current decoder layer’s hidden state
  • decoder normalises and performs a softmax on the dot product → this gives us a ‘probability distribution’ of the encoder hidden states → Question 1: This ‘probability distribution’ corresponds to how similar the decoder’s Q and the encoder’s K are?
  • decoder weighs the probabilities of the encoder hidden states to get a context vector → Question 2: Where do the ‘weights’ for this step come from?
  • decoder uses the context vector to generate the next output token

I solved it by using a print command like this:

from transformers import pipeline
classifier = pipeline(“sentiment-analysis”)
result = classifier(“I’ve been waiting for a HuggingFace course my whole life.”)
print (result)

How to post a question here?
I wanted to ask what is the meaning of token key in the output here: Transformers, what can they do? - Hugging Face NLP Course ?

(post deleted by author)

Hello! A nooby over here. I’m having a lot of problems configuring the environment.
I started by installing the requirements:
pip install datasets evaluate transformers[sentencepiece]

Then I have this example:

from transformers import pipeline
classifier = pipeline("sentiment-analysis")
results = classifier([
    "I've been waiting for a HuggingFace course my whole life.",
    "I love using Hugging Face models!",
    "This movie is terrible.",
    "I love metal"

I was asked to install pytorch. I installed it.
The a new error appears. I need to install xformers.
I installed it
An now I have this error:

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (
Using a pipeline without specifying a model name and revision in production is not recommended.
A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton.language'

When I tried to install Triton I got this error:

PS C:\Users\Usuario\Desktop\Hugging face\triton\python> pip install triton
ERROR: Could not find a version that satisfies the requirement triton (from versions: none)
ERROR: No matching distribution found for triton

Any idea about this?

Hello, in chapter 1 under the ‘The original architecture’ header, it says

During training, the encoder receives inputs (sentences) in a certain language, while the decoder receives the same sentences in the desired target language.

I was wondering if it is true that ‘the decoder receives the same sentences in the desired target language’, as I thought that the decoder receives the features of the input sentence (as converted by the encoder), and instead produces the same sentence in the desired target language. Am I correct in this understanding or is the quoted sentence correct?

I just started the NLP course. Most of the links in the first chapter lead to 404 errors, which is “a little bit” annoying. Example: links at the bottom of this page.
It would be great if someone could fix them. I guess the reason is some reorganization on the Hugging Face site

1 Like

האם יש באתר שלכם צ’אט ג’י פי טי 4 אני רוצה שתהיה כאילו אתה מפעיל של האתר

how to post a comment in this forum. which chapter consists of question and answering using nlp?

Hey fellows, please I will like to be considered for translating the chapters into my native language (Yoruba)

I have a question about the transformer architecture. I think there are multiple decoder stacks and multiple encoder stacks in the architecture. Does every decoder stack receive information from the encoders? And if so, do they all receive the same information from the last encoder? Or if not, perhaps only the first decoder receive information from the last encoder?

1 Like

Not sure what I am doing wrong, but when I replace ‘fill-mask’ with “bert-base-cased” model as suggested in chapter 1, I get the following error

KeyError: “Unknown task bert-base-cased, available tasks are [‘audio-classification’, ‘automatic-speech-recognition’, ‘conversational’, ‘depth-estimation’, ‘document-question-answering’, ‘feature-extraction’, ‘fill-mask’, ‘image-classification’, ‘image-segmentation’, ‘image-to-text’, ‘mask-generation’, ‘ner’, ‘object-detection’, ‘question-answering’, ‘sentiment-analysis’, ‘summarization’, ‘table-question-answering’, ‘text-classification’, ‘text-generation’, ‘text2text-generation’, ‘token-classification’, ‘translation’, ‘video-classification’, ‘visual-question-answering’, ‘vqa’, ‘zero-shot-audio-classification’, ‘zero-shot-classification’, ‘zero-shot-image-classification’, ‘zero-shot-object-detection’, ‘translation_XX_to_YY’]”

Hi. I have a quick question regarding sequence-to-sequence models. At the end of the video, it shows that these models can be constructed by combining encoder models(e.g. BERT) and decoder models(e.g. GPT).
I was wondering, how can RoBERTa (encoder-only model) be used both as an encoder and decoder?

In the encoder-decoder architecture, the decoder looks only backwards, i.e., at the preceding tokens, as does the decoder-only arch. The encoder-decoder arch seems to be more powerful just because there’s a whole additional component (the encoder). If this is true then why not using encoder-decoder arch for everything that is currently done by the decoder-only arch (e.g., text generation)?

I noticed today that the error described in the link below still hasn’t been corrected. bhagerty is correct; “perspires” can only mean “sweats,” so the word should be “persists” instead. Chapter 1 questions - #14 by bhagerty

Similar to the given use case of NLP, I want to build a model which can query a Excel file having multiple rows and columns including text and numbers and give answers from any part.
For example : Excel can have HR database and I can ask - Tell me what is the assigned location value for vivek.kishore. what steps should I take to build this?
A brief steps will be helpful. Shall I use vector db and llama 2?

i am facing the issue in understanding, how the below sentense is negative.

In masking, why do some models use [mask] while others models use <mask>? What is the difference between square brackets and angle brackets?

Here is a square bracket example,

from transformers import pipeline

filler = pipeline("fill-mask", model="bert-base-cased")
result = filler("This [MASK] has been waiting for you.")

Here is a angle bracket example,

unmasker = pipeline("fill-mask", model="distilroberta-base")
unmasker("This course will teach you all about <mask> models.")

Thanks so much!

Hello, I am using the hugging face classroom in teaching MS Data Analytics courses. I require all students to take the course and try to develop the lectures accordingly. May I ask about the slides used in the open NLP course? I hope I can get them if available. Thank you.

I am trying to run the code in the first chapter under text generation. Specifically I ran these lines:

from transformers import pipeline

generator = pipeline(“text-generation”)
generator(“In this course, we will teach you how to”)

The output I got was:

No module named ‘keras.saving.hdf5_format’

Keras is installed on my machine. What am I doing wrong?