BART seq2seq -100 tokens in prediction

Nevermetyou · December 25, 2023, 3:11am

Hello

I am using facebook/BART for seq2sea task. I followed along this tutorial Translation - Hugging Face NLP Course and found something weird.

When I use BART to predict a samples there are -100 tokens in the array of output

array([    2,     0,  8800,  3850, 37589,  1000,  3675, 23054,  7778,
           2,     1,     1,     1,     1,     1,     1,     1,     1,
           1,     1,     1,     1,     1,     1,  -100,  -100,  -100,
        -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
        -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
        -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
        -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
        -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,  -100,
        -100,  -100,  -100,  -100,  -100,  -100,  -100])

You can see that it already has 1 as a BART’s padding token. So, where -100 come from??
The model used in tutorial is Marian and It does predict any -100.

this is my training args

args = Seq2SeqTrainingArguments(
    save_folder,
    overwrite_output_dir=True,
    logging_strategy="epoch",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-6,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    save_total_limit=3,
    num_train_epochs=20,
    predict_with_generate=True,
    fp16=True,
    report_to="none",
    load_best_model_at_end=True,
    seed=65,
    generation_max_length=128, 
    generation_num_beams=10,
)

When -100 are in predictions, it break this function

def postprocess(predictions, labels):
    predictions = predictions.cpu().numpy()
    labels = labels.cpu().numpy()

    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)

    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    # Some simple post-processing
    decoded_preds = [pred.strip() for pred in decoded_preds]
    decoded_labels = [[label.strip()] for label in decoded_labels]
    return decoded_preds, decoded_labels

I also would like to know how to control length of predicted text?? right now my predictions are shape of (m, 79). Why 79??

Topic		Replies	Views
BART learns well, loss decreases, but prediction output is weird 🤗Transformers	2	193	March 3, 2024
-100 in predictions Beginners	1	54	December 20, 2024
Bart generates text from training data for predicted values during evaluation Beginners	0	62	July 11, 2024
Seq2seq padding 🤗Transformers	1	71	October 10, 2024
Convert Bart to seq to seq form 🤗Transformers	0	308	July 5, 2022

BART seq2seq -100 tokens in prediction

Related topics