Chapter 2 questions

Probably a little pitfall:

Hello, I believe there is a little pitfall in the instruction of “Behind the pipeline”, looks as the following:

Negative score should be “0.11%”, instead of “0.,11%”. This might cause some confusing.

2973622476

"The problem is that we sent a single sequence to the model, whereas :hugs: Transformers models expect multiple sentences by default. " - this does not seem to be a problem with Tensorflow code. Tensorflow accepts single sequence also without need for batch dimension unlike PyTorch which requires the sequence to be given with additional batch dimension

same here!
in fact without adding special tokens , I’ve got better results outputted from SoftMax function.

Hi Sylvain
First of all, thanks a tonne for this detailed course! :pray:
A bit confused why I am not getting this error?

What is the difference between AutoTokenizer and BertTokenizer

Hello Sylvain, I am just wondering if there’s a comprehensive list of tasks/pipelines anywhere? I thought it might be here Tasks - Hugging Face , but in the tutorial for this page, it mentions the “sentiment-analysis” task, which is not available on that link.

1 Like

How do I know which tokenizer to choose?
Example 1.
"The dog’s ran into the church. "
model 1: [ The, dog’s, ran, into, the, church]

model 2: [ The, dog, 's, ran, into, the, church]

This provides 2 different meaning to a model. How do I know to choose a tokenizer that store the whole word or breaks down the parts of a word?

I don’t know why this error appear.

Both are quite the same with the only difference being that bert tokenizer is specifically focused for the bert ecosystem and It handles special tokens like [CLS] (classification token), [SEP] (separator token), and [MASK] (mask token) that are specific to BERT.
Whereas the auto tokenizer is more general and versatile and can be used for multiple other models

Lesson : Behind the pipeline.

The lesson mentions the term “head” in the transformer architecture. Is it talking about attention heads ? Something is not clear to me. Is it an encoder only model ? What are the hidden states in the transformer architecture ?

Could you please help me to understand, do Tokenization and Embeddings both required to use Transformers?
In inferencing due we need to perform Tokenization / Embedding before passing the input ?

Simple question

I’m a beginner in using HF library.

Why does Hugging face define the AutoModel class and insist users to use the AutoModelForSequenceClassification instead ?

Hi, for calling the model with the tokens tensor, why do you pass **tokens not only tokens

tokens = tokenizer(sequences, padding=True, truncation=True, return_tensors=“tf”)
output = model(**tokens)

I found this answer from Google Gemini
The double star (**) in a function call is used to unpack a dictionary into keyword arguments.

For example, if you have a dictionary my_dict = {'a': 1, 'b': 2}, calling my_function(**my_dict) is equivalent to calling my_function(a=1, b=2).

Hugging Face model expects to receive two keyword arguments (input_ids & attention_mask) & hence unpacking is necessary

Wondering if/how padding would lead to the same logits for longer sequences if we have to worry about positional encodings. It seems to me that the positional encoding added to non-pad tokens would be different than if you had no pad tokens to begin with?