Issue with finetuning a seq-to-seq model

danyaljj · October 21, 2020, 8:22pm

I am using finetune.py script among the seq2seq examples to finetune for a QA task:

export NQOPEN_DIR=/home/danielk/nqopen_csv
export OUT=/home/danielk/fine_tune_t5_small

python3 finetune.py \
--data_dir $NQOPEN_DIR \
--model_name_or_path t5-small --tokenizer_name t5-small \
--learning_rate=3e-4 --freeze_encoder --freeze_embeds \
--do_train --train_batch_size 16 \
--do_predict --n_train -1 \
 --eval_beams 2 --eval_max_gen_length 142 \
--val_check_interval 0.25 --n_val 3000 \
--output_dir $OUT --gpus 4 --logger_name wandb \
--save_top_k 3

Here are how my input/outputs look like:

$ head  ~/Desktop/nqopen_csv/train.source -l 5
==> /Users/danielk/Desktop/nqopen_csv/train.source <==
total number of death row inmates in the us?
big little lies season 2 how many episodes?
who sang waiting for a girl like you?
where do you cross the arctic circle in norway?
who is the main character in green eggs and ham?
do veins carry blood to the heart or away?
who played charlie bucket in the original charlie and the chocolate factory?
what is 1 radian in terms of pi?
when does season 5 of bates motel come out?
how many episodes are in series 7 game of thrones?
head: -l: No such file or directory
head: 5: No such file or directory
$ head  ~/Desktop/nqopen_csv/train.target -l 5
==> /Users/danielk/Desktop/nqopen_csv/train.target <==
2,718
seven
Foreigner
Saltfjellet
Sam - I - am
to
Peter Gardner Ostrum
1 / 2π
February 20 , 2017
seven

After fine-tuning, I use the following script to get example generations:

path = "/Users/danielk/ideaProjects/fine_tune_t5_small/best_tfmr"
model = T5ForConditionalGeneration.from_pretrained(path)
tokenizer = T5Tokenizer.from_pretrained(path)
model.eval()

def run_model(input_string, **generator_args):
    # input_string += "</s>"
    input_ids = tokenizer.encode(input_string, return_tensors="pt")
    res = model.generate(input_ids, **generator_args)
    tokens = [tokenizer.decode(x) for x in res]
    print(tokens)

run_model("how many states does the US has? ")
run_model("who is the US president?")
run_model("who got the first nobel prize in physics?")
run_model("when is the next deadpool movie being released?")
run_model("which mode is used for short wave broadcast service?")
run_model("the south west wind blows across nigeria between?")

which gives me the following responses:

['44,100 state legislatures, 391,415 state states,527 states ; 521 states : 517 states']
['President Pro - lect Ulysses S. Truman and Mr. President Proseudo - Emees']
['Wilhelm Conrad Röntgen of Karl - Heinz Zurehmann - Shelgorithsg ⁇ rd']
['December 14, 2018. 05 - 02 - 03 - 08 - 13 - 2022. 2022']
['Fenway Wireless, Bluetooth, wireless channel system, WMV, FMN type 3D system.E.N']
["Nigeria's natural gas, but some other half saggbourns ; they reboss"]

which are quite bad.
For comparison, when I used a T5-small model fine-tuned with TPU (tensorflow), I get the following predictions:

['50']
['Donald Trump']
['Wilhelm Conrad Röntgen']
['December 18, 2018']
['TCP port 25']
['the Nigerian and Pacific Oceans']

Any thoughts on what is going wrong?

@sshleifer

sshleifer · October 21, 2020, 9:24pm

have never used examples/seq2seq/finetune.py for QA, but @valhalla may have something similar

@danyaljj can you post the TF training code you used that worked well?

danyaljj · October 21, 2020, 10:19pm

Here is TF code I used for training:

PROJECT=...
ZONE=...
BUCKET=...
TPU=...
MODEL_DIR=...

TASK=natural_questions_open_light_mixture
PRETRAINED_DIR=gs://t5-data/pretrained_models/small
PRETRAINED_STEPS=1000000
FINETUNE_STEPS=15000


# Run fine-tuning
python -m t5.models.mesh_transformer_main \
  --module_import="nqopen_tasks" \
  --tpu="${TPU}" \
  --gcp_project="${PROJECT}" \
  --tpu_zone="${ZONE}" \
  --model_dir="${MODEL_DIR}" \
  --gin_file="dataset.gin" \
  --gin_file="${PRETRAINED_DIR}/operative_config.gin" \
  --gin_param="utils.tpu_mesh_shape.tpu_topology = '8x8'" \
  --gin_param="MIXTURE_NAME = '${TASK}'" \
  --gin_param="utils.run.save_checkpoints_steps=1000" \
  --gin_param="utils.run.batch_size=('tokens_per_batch', 393216)" \
  --gin_param="utils.run.train_steps=$((PRETRAINED_STEPS + FINETUNE_STEPS))" \
  --gin_param="utils.run.init_checkpoint='${PRETRAINED_DIR}/model.ckpt-${PRETRAINED_STEPS}'" \
  --gin_param="utils.run.learning_rate_schedule=@learning_rate_schedules.constant_learning_rate" \
  --gin_param="constant_learning_rate.learning_rate=1e-3" \
  --t5_tfds_data_dir="${BUCKET}/t5-tfds"

where natural_questions_open_light_mixture is the name of the dataset I was training on.

danyaljj · October 22, 2020, 12:42am

Btw, I tried BART with the same script and it seems to be working much better:

export NQOPEN_DIR=/home/danielk/nqopen_csv
export OUT=/home/danielk/fine_tune_bart_oct21 

python3 finetune.py \
--data_dir $NQOPEN_DIR \
--model_name_or_path facebook/bart-base --tokenizer_name facebook/bart-base \
--learning_rate=3e-4 --freeze_encoder --freeze_embeds \
--do_train --train_batch_size 16 \
--do_predict --n_train -1 \
--val_check_interval 0.25 --n_val 3000 \
--output_dir $OUT --gpus 4 --logger_name wandb \
--save_top_k 3

The following ROUGE graph shows that BART-base (blue) is doing much better T5-small (brown). T5-small (brown) barely gets off the ground which is quite surprising.

danyaljj · October 22, 2020, 5:41am

@sshleifer maybe a simpler question is: has there been a successful effort to fine-tune T5 models with HF?
Btw, I am using version 3.3.1 of transformers, in case there have been any major bug fixes since then.

sshleifer · October 22, 2020, 2:14pm

Yes https://github.com/huggingface/transformers/issues/4426#issuecomment-714521374

cc @valhalla (same person as patil-suraj on github)

valhalla · October 22, 2020, 2:18pm

Hi @danyaljj have a look at community notebooks section .
There are multiple notebooks which show how to fine-tune T5 for different tasks including QA.

valhalla · October 22, 2020, 2:28pm

ROUGE might not be the best metric for QA.

valhalla · October 22, 2020, 2:50pm

Here’ what I think

The default task for finetune.py is summarization and it uses the generate parameters for summrization tasks, which are not useful here.
142 eval_max_gen_length seems too large for QA task, should be lower IMO
using beam_search might not give good results for QA, in the T5 paper they used greedy decoding for QA.

When using generate it could be using summarization generate parameters which could explain the longer answers.
Try using greedy decoding with generate, set num_beams to 1, smaller max_length (32 should enough, for SQuAD 16 is fine), and 0 length_penalty

LMK if this helps

sbhaktha · October 22, 2020, 6:26pm

I just want to get one thing confirmed— is it true that when using finetune.py for summarization with T5, where we use T5ForConditionalGeneration we do not need to explicitly prepend summarize: to each input text (Is it taken care of already)?

TIA!

danyaljj · October 22, 2020, 7:59pm

@valhalla thanks for the feedback!

It doesn’t seem like there is any --length_penalty parameter for finetune.py, but I think set the rest of the parameters you suggested. I repeated the experiment with the following commands:

export NQOPEN_DIR=/home/danielk/nqopen_csv
export OUT=/home/danielk/fine_tune_t5_small_9

python3 finetune.py \
--data_dir $NQOPEN_DIR \
--model_name_or_path t5-small --tokenizer_name t5-small \
--learning_rate=1e-4 --freeze_encoder --freeze_embeds \
--do_train --train_batch_size 16 \
--do_predict --n_train -1 \
--max_source_length=32 \
--max_target_length=16 \
--eval_max_gen_length 16 \
--eval_beams=1 \
--val_check_interval 0.25 --n_val 3000 \
--output_dir $OUT --gpus 4 --logger_name wandb \
--save_top_k 3

Here is the output (pink) and unfortunately it’s no bueno. ROUGE==5% is basically barely above random output.

If the fine-tuning works, it should quickly move north of ROUGE==25%.

danyaljj · October 23, 2020, 3:17am

I suspect that we have a bug here: https://github.com/huggingface/transformers/blob/master/examples/seq2seq/finetune.py#L104-L106

We set decoder_start_token_id when the model is BART, but nothing is set for a T5 model.

Thoughts @sshleifer @valhalla?

sshleifer · October 23, 2020, 1:37pm

Don’t think decoder_start_token_id is the issue, it will be set from config. Your command also looks reasonable.

Debugging ideas:

Try 1 GPU
looked at the saved text_batch.json to make sure it makes sense.
make sure command is same as it was for BART
remove n_train=-1

danyaljj · October 23, 2020, 5:04pm

@sshleifer

I tried dropping n_train=-1 and used 1 GPU:

python3 finetune.py \
--data_dir $NQOPEN_DIR \
--model_name_or_path t5-small --tokenizer_name t5-small \
--learning_rate=1e-4  \
--do_train \
--train_batch_size 128 \
--do_predict \
--max_source_length=12 \
--max_target_length=6 \
--eval_max_gen_length 6 \
--eval_beams=1 \
--val_check_interval 1.0 --n_val 3000 \
--output_dir $OUT --gpus 1 --logger_name wandb \
--save_top_k 3

The result is the orange plot (fine-tuning t5-small), compared to BART outputs (in blue). As see in the figure, fine-tuning t5-small is barely moving above random predictions.

Here is the content of my text_batch.json file btw:

{
    "input_ids": [
        "when does the new my hero academia movie come out",
        "when did wesley leave last of the summer wine?",
        "where is the most distortion on a robin",
        "where is each type of cartilage located in the body",
        "when were manatees put on the endangered list",
        "when was the first manmade object sent into space?",
        "who is the chief legal advisor to the government?",
        "where was the mona lisa kept during",
        "who sang the original version of true colors?",
        "who is charles off of pretty little l",
        "who won the pittsburgh steeler balt",
        "who played v in the movie v for vend",
        "who is the youngest judge currently sitting on the u",
        "who wrote variations on twinkle twinkle little star?",
        "who plays the fairy god mother in shrek 2",
        "element whose third shell contains two p electrons",
        "who played davenport in last of the summer wine",
        "i've got a brand new pair of",
        "where are the fruits of the spirit found in the bible",
        "who played mason on wizards of waverly",
        "who plays mayor hamilton on nc",
        "what is the postal code for warri nig",
        "where was sir gawain and the green",
        "the story idea for the yellow wallpaper was based on",
        "the good doctor season 1 episode 2 air date?",
        "what is the name of the college in the classic movie",
        "where is the new years eve concert held?",
        "who represented russia at the congress of vienn",
        "when did the second amendment go into effect?",
        "how much does an arleigh burke destroyer cost",
        "what cities are in santa rosa",
        "how many episodes in the itv series girlfriends?"
    ],
    "attention_mask": [
        32,
        12
    ],
    "labels": [
        "July 5, 2018",
        "2002",
        "close to the poles",
        "bronchial tubes",
        "1966",
        "3 October 1942",
        "Law Officers of the Crown",
        "the Ingres Museum",
        "Cyndi Lauper",
        "Drake",
        "Pittsburgh Steelers",
        "Hugo Weaving",
        "Neil Gorsuch",
        "Jane Taylor",
        "Jennifer Saunders",
        "Magnesium",
        "Josephine Tewson",
        "Melanie",
        "Epistle to the Galatians",
        "Sulkin",
        "Steven Robert Weber",
        "332",
        "late 14th - century",
        "rest cure",
        "October 2, 2017",
        "Faber College",
        "Westminster Central Hall",
        "Count Karl Robert Nesselrode",
        "December 15, 1791",
        "US $1.843 billion",
        "Gulf Breeze",
        "("
    ],
 "ids": [
        "",
        "s",
        "and",
        "e",
        "is",
        "i",
        "I",
        "'",
        "en",
        "your",
        "have",
        "was",
        "m",
        "!",
        "not",
        "y",
        "die",
        "A",
        "c",
        "they",
        "ul",
        "/",
        "out",
        "up",
        "if",
        "do",
        "h",
        "\u00een",
        "b",
        "other",
        "cu",
        "or"
    ],
    "decoder_input_ids": [
        "July 5, 2018",
        "2002",
        "close to the poles",
        "bronchial tubes",
        "1966",
        "3 October 1942",
        "Law Officers of the Crown",
        "the Ingres Museum",
        "Cyndi Lauper",
        "Drake",
        "Pittsburgh Steelers",
        "Hugo Weaving",
        "Neil Gorsuch",
        "Jane Taylor",
        "Jennifer Saunders",
        "Magnesium",
        "Josephine Tewson",
        "Melanie",
        "Epistle to the Galatians",
        "Sulkin",
        "Steven Robert Weber",
        "332",
        "late 14th - century",
        "rest cure",
        "October 2, 2017",
        "Faber College",
        "Westminster Central Hall",
        "Count Karl Robert Nesselrode",
        "December 15, 1791",
        "US $1.843 billion",
        "Gulf Breeze",
        "("
    ]
}

which looks normal (except the possible case of attention_mask which I don’t quite understand).

danyaljj · October 23, 2020, 5:29pm

For comparison, here is the output from text_batch.json file for BART (which seems to work well).

{
    "input_ids": [
        "<s> who sings does he love me with reba?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> what is the smallest prime number that is greater than 30?</s><pad><pad><pad><pad><pad><pad>",
        "<s> who introduced the system of civil services in india?</s><pad><pad><pad><pad><pad><pad><pad>",
        "<s> when was the public service commission original version of the upsc set up?</s><pad><pad><pad>",
        "<s> who wrote the song two out of three ain't bad?</s><pad><pad><pad><pad><pad><pad>",
        "<s> who has the most receiving yards in one game?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> how many games to get premier league medal?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> what do they call snowboarders in johnny tsunami?</s><pad><pad><pad><pad><pad><pad>",
        "<s> who is the old man in waiting on a woman?</s><pad><pad><pad><pad><pad><pad><pad>",
        "<s> in attack on titan who is the female titan?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who got pregnant in gossip girl season 5?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who sang the theme from the greatest american hero?</s><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who wins the 2017 australian open men's single title?</s><pad><pad><pad><pad><pad>",
        "<s> who does the voice for love island australia?</s><pad><pad><pad><pad><pad><pad><pad>",
        "<s> when did the king kong ride burn down?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> 5 types of control that could be programmed on a gui?</s><pad><pad><pad><pad><pad><pad>",
        "<s> who is the ceo of t rowe price?</s><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who plays chaka in land of the lost?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who is the head coach of the minnesota timberwolves?</s><pad><pad><pad><pad><pad><pad>",
        "<s> when was the planning commission set up to prepare a blue print of development for the country?</s>",
        "<s> where is the mesophyll located in a plant?</s><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who sings bartender i really did it this time?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who wrote somebody like you by keith urban?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> in what year did japan attack pearl harbor?</s><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> when is the show six coming back on?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who does haruhi end up with in ouran highschool host club?</s><pad><pad>",
        "<s> who was the first woman appointed to the supreme court?</s><pad><pad><pad><pad><pad><pad><pad>",
        "<s> who controlled blue cross when it was formed?</s><pad><pad><pad><pad><pad><pad><pad><pad><pad>",
        "<s> the most readily absorbed form of iron in the diet is?</s><pad><pad><pad><pad><pad><pad>",
        "<s> who has the highest minimum wage in the usa?</s><pad><pad><pad><pad><pad><pad><pad>",
        "<s> when does fairy tail dragon cry come out in canada?</s><pad><pad><pad><pad><pad><pad>",
        "<s> who is known as the father of humanism?</s><pad><pad><pad><pad><pad><pad><pad><pad>"
    ],
    "attention_mask": [
        32,
        20
    ],
    "labels": [
        "<s> Linda Davis</s><pad><pad><pad><pad><pad>",
        "<s> 31</s><pad><pad><pad><pad><pad><pad>",
        "<s> Charles Cornwallis</s><pad><pad><pad><pad>",
        "<s> October 1, 1926</s><pad><pad><pad>",
        "<s> Meat Loaf</s><pad><pad><pad><pad>",
        "<s> Flipper Anderson</s><pad><pad><pad><pad>",
        "<s> a minimum of five</s><pad><pad><pad>",
        "<s> Urchins</s><pad><pad><pad><pad>",
        "<s> Andy Griffith</s><pad><pad><pad><pad><pad>",
        "<s> Ymir Fritz</s><pad><pad><pad><pad>",
        "<s> Blair</s><pad><pad><pad><pad><pad><pad>",
        "<s> American singer Joey Scarbury</s><pad><pad>",
        "<s> Roger Federer</s><pad><pad><pad><pad>",
        "<s> Eoghan McDermott</s><pad><pad>",
        "<s> 2008</s><pad><pad><pad><pad><pad><pad>",
        "<s> List box</s><pad><pad><pad><pad><pad>",
        "<s> William Stromberg</s><pad><pad><pad>",
        "<s> Jorma Taccone</s><pad>",
        "<s> Thomas Joseph Thibodeau Jr.</s>",
        "<s> 15 March 1950</s><pad><pad><pad><pad>",
        "<s> In leaves</s><pad><pad><pad><pad><pad>",
        "<s> American southern rock group Rehab</s><pad><pad>",
        "<s> John Shanks</s><pad><pad><pad><pad>",
        "<s> 1941</s><pad><pad><pad><pad><pad><pad>",
        "<s> May 28, 2018</s><pad><pad><pad>",
        "<s> Tamaki</s><pad><pad><pad><pad><pad>",
        "<s> Sandra Day O'Connor</s><pad><pad>",
        "<s> 1929</s><pad><pad><pad><pad><pad><pad>",
        "<s> animal products</s><pad><pad><pad><pad><pad>",
        "<s> Washington</s><pad><pad><pad><pad><pad><pad>",
        "<s> August 14, 2017</s><pad><pad><pad>",
        "<s> Petrarch</s><pad><pad><pad><pad><pad>"
    ],
"ids": [
        "<s>",
        ".",
        " and",
        "-",
        " is",
        " The",
        " it",
        " be",
        " are",
        " (",
        " will",
        " \ufffd",
        "\ufffd",
        " we",
        " had",
        ",\"",
        " can",
        " $",
        ".\"",
        " year",
        " two",
        " our",
        " into",
        " new",
        " In",
        "I",
        "S",
        "'",
        " 1",
        "?",
        " get",
        " back"
    ],
    "decoder_input_ids": [
        "</s><s> Linda Davis</s><pad><pad><pad><pad>",
        "</s><s> 31</s><pad><pad><pad><pad><pad>",
        "</s><s> Charles Cornwallis</s><pad><pad><pad>",
        "</s><s> October 1, 1926</s><pad><pad>",
        "</s><s> Meat Loaf</s><pad><pad><pad>",
        "</s><s> Flipper Anderson</s><pad><pad><pad>",
        "</s><s> a minimum of five</s><pad><pad>",
        "</s><s> Urchins</s><pad><pad><pad>",
        "</s><s> Andy Griffith</s><pad><pad><pad><pad>",
        "</s><s> Ymir Fritz</s><pad><pad><pad>",
        "</s><s> Blair</s><pad><pad><pad><pad><pad>",
        "</s><s> American singer Joey Scarbury</s><pad>",
        "</s><s> Roger Federer</s><pad><pad><pad>",
        "</s><s> Eoghan McDermott</s><pad>",
        "</s><s> 2008</s><pad><pad><pad><pad><pad>",
        "</s><s> List box</s><pad><pad><pad><pad>",
        "</s><s> William Stromberg</s><pad><pad>",
        "</s><s> Jorma Taccone</s>",
        "</s><s> Thomas Joseph Thibodeau Jr.",
        "</s><s> 15 March 1950</s><pad><pad><pad>",
        "</s><s> In leaves</s><pad><pad><pad><pad>",
        "</s><s> American southern rock group Rehab</s><pad>",
        "</s><s> John Shanks</s><pad><pad><pad>",
        "</s><s> 1941</s><pad><pad><pad><pad><pad>",
        "</s><s> May 28, 2018</s><pad><pad>",
        "</s><s> Tamaki</s><pad><pad><pad><pad>",
        "</s><s> Sandra Day O'Connor</s><pad>",
        "</s><s> 1929</s><pad><pad><pad><pad><pad>",
        "</s><s> animal products</s><pad><pad><pad><pad>",
        "</s><s> Washington</s><pad><pad><pad><pad><pad>",
        "</s><s> August 14, 2017</s><pad><pad>",
        "</s><s> Petrarch</s><pad><pad><pad><pad>"
    ]
}

Comparing this output with the file corresponding to t5-small, two things come to my mind:

In text_batch.json of t5-small, don’t see any <pad> or <s> tags, both among labels and input_ids. Is that normal?
For both models, attention_mask is quite short. Shouldn’t this be a list of 0s and 1s, with the same lengths as input_ids?

sshleifer · October 23, 2020, 5:40pm

we just print the shape of attention_mask for brevity.
+Weird that the t5 version is missing special tokens, whereas bart has them. I suspect this is the problem.
Seems like you are truncating more aggressively for T5.

danyaljj · October 23, 2020, 7:55pm

we just print the shape of attention_mask for brevity.

Okay, got it.

Seems like you are truncating more aggressively for T5.

Yeah, but I think it should cover over 90% of the distribution.

+Weird that the t5 version is missing special tokens, whereas bart has them. I suspect this is the problem.

Yeah I think we have a tokenization-related problem. See further findings below:

This is where these log files are dumped; it uses batch_decode(.) to convert the indices into strings. And by default, batch_decode(.) has to dump the special tokens. So we should definitely be able to see the special tokens.

However, when I looked into tok_batch.json file (snippet shown below), I see that:

All the instances in input_ids end with the special id of 1, which corresponds to the end-of-sentence special token. So they seem to be good. Not sure why this is not correctly shown in text_batch.json file.
attention_mask are fine.
decoder_input_ids start with a 0 (padding) and end with 1 (end of sentence); so they’re good. Again, not sure why they’re not shown in the output.

{
  "input_ids": [
    [
      113,
      10159,
      7,
      405,
      3,
      88,
      333,
      140,
      28,
      3,
      60,
      1
    ],
    [
      149,
      186,
      1688,
      19,
      17472,
      388,
      57,
      3,
      4900,
      102,
      107,
      1
    ],
    [
      213,
      103,
      8,
      248,
      17721,
      942,
      8,
      5431,
      58,
      1,
      0,
      0
    ]
  ],
  "attention_mask": [
    [
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1
    ],
    [
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1
    ],
    [
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      1,
      0,
      0
    ]
  ],
  "labels": [
    [
      16121,
      8688,
      1,
      0,
      0,
      0,
      0,
      0
    ],
    [
      305,
      4959,
      41,
      511,
      4182,
      3,
      61,
      1
    ],
    [
      8,
      2788,
      16617,
      2473,
      1,
      0,
      0,
      0
    ]
  ],
  "decoder_input_ids": [
    [
      0,
      16121,
      8688,
      1,
      0,
      0,
      0,
      0
    ],
    [
      0,
      305,
      4959,
      41,
      511,
      4182,
      3,
      61
    ],
    [
      0,
      8,
      2788,
      16617,
      2473,
      1,
      0,
      0
    ]
  ]
}

Any additional suggestions for debugging, @sshleifer @valhalla?

valhalla · October 24, 2020, 7:25am

There is no <s> (bos) token in T5.

Also it might be useful to train the models and calculate the appropriate metric for the QA task rather than comparing ROUGE or loss, to see how it’s performing.

danyaljj · October 24, 2020, 7:29am

The appropriate metric is exact-match, which is less lenient than ROUGE.

A TPU/TF trained T5-small scores 23% exact-match.
A GPU/HF trained T5-small barely gets to 15% ROUGE (which is more relaxed than exact-match).

jsrozner · October 24, 2020, 10:48pm

Visiting from your comment on github!

I am just getting back to this project (finetuning t5) this weekend. I was planning to do similar investigations to what you’ve been doing here, since I also suspected something might be wrong with tokenizer / EOS/ masking.

I haven’t run any new experiments yet, but I still don’t quite understand why the model was struggling to learn to output EOS tokens when doing beam search generation (with a high max_len).

Also, per the comment to adjust n_train: I just want to confirm that setting n_train=-1 should have no effect? Or am I missing something.

Topic		Replies	Views
Problem fine-tuning a model with Seq2Seq Trainer Beginners	1	995	June 25, 2023
T5: Tips for finetuning on crossword clues (clue => answer) Models	1	629	October 14, 2020
How To Output "test_generations.txt" with run_seq2seq.py? Beginners	5	746	March 9, 2021
Issues running seq2seq distillation 🤗Transformers	4	862	January 11, 2021
Fine-tuning seq2seq: Helsinki-NLP 🤗Transformers	4	2270	December 8, 2020

Issue with finetuning a seq-to-seq model

Related topics