Regarding the data input injected into transformer_xl or transformer models

jjun123 · November 2, 2023, 5:39am

If I proceed with the code according to the pytorch version, the following data will be injected into the model’s input.
bsz =10
target_len = 3
1 11 21
2 12 22.
3 13 23.
4 14 24.
5 15 25
6 16 26
7 17 27.
8 18 28.
9 19 29.
10 20 30.

And this is one sentence.

That’s where the question arises.

If the data of the shape above is entered through the input of the model, will it be learned between the components of the batch.

I will explain it in more detail to prevent misunderstandings from occurring.

When the data enters the model, it becomes transpose and changes the arrangement as shown below.
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30

In this state, the batch size is 3 and three separate inputs are added to the model.

The problem here is that the above three inputs are one sentence.

In this case, I wonder if the separate 3 inputs entering one batch can be learned in a state connected to each other.

If possible, I’d like to know the reason and principle.

Topic		Replies	Views
Data shape needed for training TransformerXL from scratch Beginners	2	331	January 12, 2021
Will it be learned properly if tokens listed in one dimension are reshaped in the form of (batch, seq_len) and inputted into the transformer xl model? 🤗Transformers	0	169	October 27, 2023
The model trained in PyTorch produces inconsistent predictions for the same image when processed individually versus in a batch 🤗Transformers	4	52	December 20, 2024
Training Model - Transformers 🤗Datasets	0	155	January 9, 2024
Feature extraction output Beginners	0	410	March 12, 2022

Regarding the data input injected into transformer_xl or transformer models

Related topics