Issue with finetuning a seq-to-seq model

danyaljj · November 7, 2020, 7:13am

it still continues to generate many more tokens than it should

That was exactly my observation too. Which led me to think that somehow the model is not learning EOS character (hence, the generation is not functioning as expected).

Re. prefix:

Looks like the prefix is set here: https://github.com/huggingface/transformers/blob/master/examples/seq2seq/utils.py#L243

which seems like it’s passed here:

github.com

huggingface/transformers/blob/068e6b5eddb5dba296f0c42f36fffb9368c9fe24/examples/seq2seq/finetune.py#L79


      
          self.hparams_save_path = Path(self.output_dir) / "hparams.pkl"
          pickle_save(self.hparams, self.hparams_save_path)
          self.step_count = 0
          self.metrics = defaultdict(list)
          self.model_type = self.config.model_type
          self.vocab_size = self.config.tgt_vocab_size if self.model_type == "fsmt" else self.config.vocab_size
          
          self.dataset_kwargs: dict = dict(
              data_dir=self.hparams.data_dir,
              max_source_length=self.hparams.max_source_length,
              prefix=self.model.config.prefix or "",
          )
          n_observations_per_split = {
              "train": self.hparams.n_train,
              "val": self.hparams.n_val,
              "test": self.hparams.n_test,
          }
          self.n_obs = {k: v if v >= 0 else None for k, v in n_observations_per_split.items()}
          
          self.target_lens = {
              "train": self.hparams.max_target_length,

Where self.model.config.prefix is being picked up from? Not sure.

Topic		Replies	Views
Problem fine-tuning a model with Seq2Seq Trainer Beginners	1	993	June 25, 2023
T5: Tips for finetuning on crossword clues (clue => answer) Models	1	629	October 14, 2020
How To Output "test_generations.txt" with run_seq2seq.py? Beginners	5	746	March 9, 2021
Issues running seq2seq distillation 🤗Transformers	4	862	January 11, 2021
Fine-tuning seq2seq: Helsinki-NLP 🤗Transformers	4	2267	December 8, 2020

Issue with finetuning a seq-to-seq model

Related topics