Chapter 1 questions

I do not see chapter 10-12 though it is mentioned in this Introduction chapter that ‘Chapters 9 to 12 go beyond NLP, and explore how Transformer models can be used to tackle tasks in speech processing and computer vision’.

May I know if chapter 11-12 are under development and yet to release ?


From the original architecture, decoders have two inputs- first the text prompted/generated by it and the other is the embedding
from encoders. GPT like models are classified as decoders. So when GPT models generate text, the prompt is provided by the user, but where do they get the input embedding from ?

Better Encoder-Decoders example

The example used by in the encoders/decoders video doesn’t really do much to help understand the actual inputs and outputs.

On the Encoder side, it has:

Welcome to NYC

But on the Decoder side, it just has Start of Sequence in the input and then Word_1, Word_2, etc.

This left me pretty confused as to what you would actually pass into the model. In this example, would the “Wecome to NYC” input be an inference input (prompt)? Or are the Encoder outputs evaluated during training and then embedded into the decoder?

It would be very helpful to actually provide a full end-to-end example showing a real life use-case (maybe a “completion” style model)

What optimization metrics are used to train large language models?

I tried out the question answering example from Chapter 1. My context was from a Wikipedia article about 3772 characters long and asked 11 simple questions very carefully worded to match answers to be easily extracted from the context. It amazes me when it gets the right answers but only 7/11 were at least partially correct. I tried a model trained with Squad v2 and got identical results. What might I need to do to get more appropriate answers? Eventually I want to use a large set of documentation as the context from which simple questions would extract answers from.

I was experimenting with the question answering code in Chapter 1 and am trying to improve the results. I am using TensorFlow and would like to perform the operations without the pipeline, perhaps by tokenizing and breaking up the context somehow I can get better results. So far I can’t get the right sequence to replace the pipeline. I have been looking for an example that would show how to do this, but have not found any. I don’t want to get into training the model at this point as I am using pretrained Squad v1/v2. Can anyone point to an example?

There is a video on YouTube “Inside the Question Answer Pipeline” that shows some code. But there are two variables, start_pos and end_pos that appear to be undefined. Does anyone know what they are supposed to be?

The problem is with the TenbsorFlow version of the video. The Pytorch version does not use either of these variables and creates the scores matrix using:
scores = start_probabilities[:, None] * end_probabilities[None,:]

Got this working with small contexts.