How can I tell what each dataset was used for?

Hello,
In many model cards, there’s a list of datasets — sometimes including several different ones. How can I determine which datasets were used for training, fine-tuning, or evaluation when it’s not explicitly specified?

For example, in the model card for sileod/deberta-v3-base-tasksource-nli, many datasets are listed. How can I find out which specific ones were actually used for training?

Thanks!

1 Like

Dataset tags are sometimes automatically assigned by the trainer, but they are generally optional fields filled in by the model author, and there is no established method for obtaining further details.

However, in cases such as models related to academic papers, detailed information such as the actual training code used may be available on GitHub or in the paper linked from the model card. There appears to be information available for this model.

Additionally, you can directly contact the author through the community section of each model.