I noticed in most / all dialogPT tutorials, when somebody trains on top of it with their own data, the answers they get back from it always turn into “!!!?!?!!;,!.com?!” - “!!!” - “”, and stuff like that after about 3-5 questions. I also had this problem in my own training code. Why is that?
From my experience this correlates with:
- Lack of fine-tuning for your specific length. I don’t know why that is the case but I have noticed a significant drop in this “!!!?!?!!;,!.com?!” thing once you increase the fine-tuning dataset size.
- This seems to only occur on dialoGPT-small. Have not seen it once on the medium version. This is not that big a deal since if you can train dialoGPT-small, generaly you will be able to train dialoGPT-mid on the same GPU.
P.S. You had me confused for a second there . It’s not “dialogPT” it’s dialoGPT as it’s based on the GPT-2 model.