What does model.generate do I'm not?

I never solved this, but it is most likely a combination of the following:

  1. Not using the advanced decoding methods inference does (top-k,top-p,beam-search)
  2. The format does not match the original training data.
  3. Hugging face quirks

While I still feel the outputs should at least be in the same ballpark, it’s understandable that the training output is poor.

My suggestion is to use hugging face sparingly. It’s easy to use, but that means it is opaque to what’s going on. I’ve spent a lot of time trying to figure out what the hugging face code is doing when I get unusual results. Unless you are doing “standard” work, it’s best to avoid it. It takes more time, but building the system from lower-level code normally pays off. It may be more complex, but it’s easier to debug and understand.

You can see another issue I have that shows the potential weirdness when using hugging face methods.