Using Bart for classification task (Fine-Tuning): Input format

animeshk · April 12, 2024, 8:08pm

I have a custom dataset which is not text document.

Already have a pretrained BART model. For this pretraining purpose, the format of the data for BART model is like this:

For a single input:

"input_ids": [code1 code2 sep_token_id mask_token_id sep_token_id code5 sep_token_id code10 .... sep_token_id mask_token_id]
"attention_mask": [1 1 1 1 ... 1 1 1 ] # number of 1s = the length of "input_ids"
"decoder_input_ids": [EOS code1 code2 201 code3 201 code4 201 code5  201 code10 ... 201 code250]

Now want to find tune that model for classification purposes.

For using BartForSequenceClassification model:
I am using the BERTTokenizer to load the vocab file and then using the custom datacollator to generate the proper format of the inputs for these “input_ids”, “attention_mask”, “decoder_input_ids”, “labels”.

I am getting the error on the 2nd line:

eos_mask = input_ids.eq(self.config.eos_token_id)
sentence_representation = x[eos_mask, :].view(x.size(0), -1, x.size(-1))[:, -1, :]

because in my input_ids, I don’t have any eos_token_id.

Could you please tell me what is proper format of the input_ids, decoder_input_ids and others in the case of BartForSequenceClassification?

is the format same as above?

animeshk · August 28, 2024, 8:36pm

@ArthurZ could you please tell me this is the correct format?

ArthurZ · August 29, 2024, 8:42am

Hey! The eos token can be replace with anything as long as it serves the purpose. For example you can use the sep_token or the mask_token to solve the error.
I am not entirely sure what you are doing and am not a pro at training bert otherwise, would rather go on the courses we have about training encoder-decoders!

Topic		Replies	Views
BART - Input format Intermediate	4	1783	December 13, 2023
Training BART, error when preparing decoder_input_ids. Shape of input_ids? Beginners	3	1454	August 7, 2020
Finetuning BART on a multi-input sequence to sequence task 🤗Transformers	0	733	September 22, 2021
Train Bart for Conditional Generation (e.g. Summarization) Models	14	17155	November 22, 2023
Fine-Tune BART using "Fine-Tuning Custom Datasets" doc Beginners	6	9335	October 28, 2020

Using Bart for classification task (Fine-Tuning): Input format

Related topics