Train Bart for Conditional Generation (e.g. Summarization)

You can ignore the specifying labels index when initial the loss function.