BART-base generating completely wrong output after training for more than 3 epochs

I’m working on training BART-base on adding html formatting to any text. I trained it with thousands of web articles (using the example translation script).

python3 run_translation.py \
    --model_name_or_path "facebook/bart-base"  \
    --do_train \
    --do_eval \
    --source_lang "raw" \
    --target_lang "html" \
    --num_train_epochs=3 \
    --train_file "./datasets/training_articles.json" \
    --validation_file "./datasets/validation_articles.json" \
    --output_dir "./model" \
    --per_device_train_batch_size=4 \
    --per_device_eval_batch_size=4 \
    --overwrite_output_dir \
    --predict_with_generate \
    --max_source_length 560 \
    --max_target_length 560 \
    --val_max_target_length 560 \
    --save_strategy "epoch"

When training for 3 epochs (default) the results are really good! I’m getting around 80 Bleu score and 0.02 loss during training 0.03 during validation:

{
    "epoch": 3.0,
    "eval_bleu": 86.1494,
    "eval_gen_len": 127.8183,
    "eval_loss": 0.032098282128572464,
    "eval_runtime": 276.0122,
    "eval_samples": 556,
    "eval_samples_per_second": 2.014,
    "eval_steps_per_second": 0.504
}

I use this to predict the output:

inputs = tokenizer([input_text], max_length=1024, return_tensors='pt')
    summary_ids = model.generate(inputs['input_ids'], num_beams=1, max_length=1024, early_stopping=True)
    raw_output = [tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids]
    output = ''.join(raw_output)

The results are impressive when dividing an article in chunks. This is from an article thats not even related to the dataset:

INPUT:
gosphere. One common way to elevate the importance of a new album is to turn it into an event. Release parties are customary, but often lack intimacy and focus (and usually cost money to book a venue). A much better alternative is to host a listening party in your own home studio.
I recently attended one such event for a studio vocalist named Sciren. Inspired by the success of her listening party, the following is some practical advice to make your own personal affair a similar victory.
Three Tips to Throw 

PREDICTED OUTPUT:
gosphere. One common way to elevate the importance of a new album is to turn it into an event. Release parties are customary, but often lack intimacy and focus (and usually cost money to book a venue). A much better alternative is to host a listening party in your own home studio.
I recently attended one such event for a studio vocalist named Sciren. Inspired by the success of her listening party, the following is some practical advice to make your own personal affair a similar victory.
<h2>Three Tips to Throw 

INPUT:
a Successful Release Party
1. Target wisely
Before you even reach out, do the research to really understand what type of music makes a particular writer most excited. Ask someone you respect (who will give you an honest, unbiased opinion) if they think your music aligns with the blogger’s taste. Then, and only then, reach out and personally invite them to your event.
If they reply, share a private link to the music you will be debuting. By giving them special access to unreleased material, you show your res

PREDICTED OUTPUT:
a Successful Release Party</h2>
<h3>1. Target wisely</h3>:
Before you even reach out, do the research to really understand what type of music makes a particular writer most excited. Ask someone you respect (who will give you an honest, unbiased opinion) if they think your music aligns with the blogger’s taste. Then, and only then, reach out and personally invite them to your event.
If they reply, share a private link to the music you will be debuting. By giving them special access to unreleased material, you show your res

INPUT:
pect for their opinion. If they like what they hear in advance of the event, it strengthens their commitment to attending.
All of this pre-event activity is crucial because, why would you want someone at your listening party if you don’t truly think they will like what you have to offer? If there was one recurring theme from my article on getting a blogger to notice your music, it was not to waste peoples’ time- neither yours nor theirs.
2. Be hospitable
Once you have successfully drawn the blogger to your

PREDICTED OUTPUT:
pect for their opinion. If they like what they hear in advance of the event, it strengthens their commitment to attending.
All of this pre-event activity is crucial because, why would you want someone at your listening party if you don’t truly think they will like what you have to offer? If there was one recurring theme from my article on getting a blogger to notice your music, it was not to waste peoples’ time- neither yours nor theirs.
<h2>2. Be hospitable</h2>:Once you have successfully drawn the blogger to your

As you can see it managed to add header tags, even between chunks.

I tried training it for 8 epochs and I’m getting the following eval results:

{
    "epoch": 8.0,
    "eval_bleu": 90.926,
    "eval_gen_len": 116.831,
    "eval_loss": 0.05103478953242302,
    "eval_runtime": 1535.0735,
    "eval_samples": 4178,
    "eval_samples_per_second": 2.722,
    "eval_steps_per_second": 0.681
}

Looks even better right? 90 bleu score.
The problem arises when I try to predict an output. This time it looks like it’s adding even more “html tags” between some sentences but theyre completely gibberish, it almost seems like the tokenizer is mixing up the tokens and adding the wrong words, like for example the word “Buyable”:

INPUT:
gosphere. One common way to elevate the importance of a new album is to turn it into an event. Release parties are customary, but often lack intimacy and focus (and usually cost money to book a venue). A much better alternative is to host a listening party in your own home studio.       
I recently attended one such event for a studio vocalist named Sciren. Inspired by the success of her listening party, the following is some practical advice to make your own personal affair a similar victory.
Three Tips to Throw

PREDICTED OUTPUT:
gosphere. One common way to elevate the importance of a new album is to turn it into an event. Release parties are customary, but often lack intimacy and focus (and usually cost money to book a venue). A much better alternative is to host a listening party in your own home studio. guerrillaI recently attended one such event for a studio vocalist named Sciren. Inspired by the success of her listening party, the following is some practical advice to make your own personal affair a similar victory. exclusivelyThree Tips to Throw

INPUT:
a Successful Release Party
1. Target wisely
Before you even reach out, do the research to really understand what type of music makes a particular writer most excited. Ask someone you respect (who will give you an honest, unbiased opinion) if they think your music aligns with the blogger’s taste. Then, and only then, reach out and 
personally invite them to your event.
If they reply, share a private link to the music you will be debuting. By giving them special access to unreleased material, you show your res

PREDICTED OUTPUT:
a Successful Release PartyBuyable1. Target wiselyablishBefore you even reach out, do the research to really understand what type of music makes a particular writer most excited. Ask someone you respect (who will give you an honest, unbiased opinion) if they think your music aligns with the blogger’s taste. Then, and only then, reach out and personally invite them to your event.ablishIf they reply, share a private link to the music you will be debuting. By giving them special access to unreleased material, you show your res

INPUT:
pect for their opinion. If they like what they hear in advance of the event, it strengthens their commitment to attending.
All of this pre-event activity is crucial because, why would you want someone at your listening party if you don’t truly think they will like what you have to offer? If there was one recurring theme from my article on getting a blogger to notice your music, it was not to waste peoples’ time- neither yours nor theirs.
2. Be hospitable
Once you have successfully drawn the blogger to your

PREDICTED OUTPUT:
pect for their opinion. If they like what they hear in advance of the event, it strengthens their commitment to attending. coerciveAll of this pre-event activity is crucial because, why would you want someone at your listening party if you don’t truly think they will like what you have to offer? If there was one recurring theme from my article on getting a blogger to notice your music, it was not to waste peoples’ time- neither yours nor theirs.ablish2. Be hospitableablishOnce you have successfully drawn the blogger to your

I checked the checkpoints and that exact model at epoch 3 performs properly (just as expected) but starting on epoch 4 it starts doing that. I’m so confused, I feel like the code I’m using to predict is wrong… The eval test should be getting the same errors yet it’s telling me it got an even better score. I feel like this has to be some sort of software or maybe hardware bug?

Any help will be appreciated!