Looking at the source code for run_summarization.py
, the error seems to be in the __post_init__
call of DataTrainingArguments
here:
if self.dataset_name is None and self.train_file is None and self.validation_file is None:
raise ValueError("Need either a dataset name or a training/validation file.")
So it looks like the data from S3 is not parsed to train_file
, looking at the train_file
argument it seems to expect a csv or json, rather than a location on S3
train_file: Optional[str] = field(
default=None, metadata={"help": "The input training data file (a jsonlines or csv file)."}
)
Am I right in conluding from this that the run_summarization
script cannot be used with data from S3?