Using custom csv data with run_summarization.py in sagemaker

I could reproduce your problem and it is coming from the hyperparameter definition. The files will be saved in the job to the following directories

SM_CHANNEL_TEST=/opt/ml/input/data/test
SM_CHANNEL_TRAIN=/opt/ml/input/data/train

so
'train_file' '/opt/ml/input/data/train/train_20210607.csv' and
'validation_file''/opt/ml/input/data/test/val_20210607.csv'

The environment variables SM_HP_VALIDATION_FILE and SM_HP_TRAIN_FILE are representing the values from the hyperparameter dict and not where the files are stored.

3 Likes