I’ve been trying to train T5 on a custom dataset similar to Squad v2 by modifying the T5 on TPU colab written by Suraj Patil. (I have Colab Pro so I train on a high-ram TPU instance). The link I have shared is training on Squad v2 directly, which seems to have the same problem.
link: Google Colaboratory
However, no matter how much I train the error seems to stay constant, i.e. the model does not seem to learn. This seems to be the case both for loss recorded during training, as well as loss during the valuation phase.
Could someone please tell me what I am doing wrong? I am going a little crazy trying to figure it out.
Thank you in advance all.
I’ve corrected the following issues thus far:
- Different XLA import at start
- Modification of code to allow for answer-less questions under Squadv2 (as opposed to Squadv1 for Suraj’s original code) under the eos/encoder section
- Edited data imports to use huggingface’s datasets.load_datasets instead of NLP
- Under T2TDataCollator, modify batching to ensure that the inputs are tensors instead of lists, e.g.: torch.FloatTensor(example[‘input_ids’]).to(torch.int64)
- Specifying transformers version 2.9.1 to allow for Suraj’s particular usage of T5DataCollator (although I created a version using the current version of transformers, this also has the same problem described above).
[5b. If I use the current version of transformers and not 2.9.1, I make various modifications to T5DataCollator and the labels generated in the training phase to be (‘labels’, ‘decoder_attention_mask’ instead of ‘target_ids’ and ‘target_attention_mask’]