What is the difference between forward() and generate()?


It seems like some models implement both functions and semantically they behave similarly, but might be implemented differently? What is the difference? In both cases, for an input sequence, the model produces a prediction (inference)?





  • forward() can be used both for training and inference.
  • generate() can only be used at inference time, and uses forward() behind the scenes. It is used for several decoding strategies such as beam search, top k sampling, and so on (a detailed blog post can be found here).
