Hi I used this approach to modify the train.py code in this git repo to train the model on go_emotions dataset. However I obtain the following error when I try to fit the model:
2021-09-11 09:21:02,657 sagemaker-training-toolkit INFO Imported framework sagemaker_pytorch_container.training
2021-09-11 09:21:02,688 sagemaker_pytorch_container.training INFO Block until all host DNS lookups succeed.
2021-09-11 09:21:08,914 sagemaker_pytorch_container.training INFO Invoking user training script.
2021-09-11 09:21:09,260 sagemaker-training-toolkit INFO Invoking user script
Training Env:
{
"additional_framework_parameters": {},
"channel_input_dirs": {
"test": "/opt/ml/input/data/test",
"train": "/opt/ml/input/data/train"
},
"current_host": "algo-1",
"framework_module": "sagemaker_pytorch_container.training:main",
"hosts": [
"algo-1"
],
"hyperparameters": {
"train_batch_size": 32,
"model_name": "distilbert-base-uncased",
"epochs": 1
},
"input_config_dir": "/opt/ml/input/config",
"input_data_config": {
"test": {
"TrainingInputMode": "File",
"S3DistributionType": "FullyReplicated",
"RecordWrapperType": "None"
},
"train": {
"TrainingInputMode": "File",
"S3DistributionType": "FullyReplicated",
"RecordWrapperType": "None"
}
},
"input_dir": "/opt/ml/input",
"is_master": true,
"job_name": "ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374",
"log_level": 20,
"master_hostname": "algo-1",
"model_dir": "/opt/ml/model",
"module_dir": "s3://sagemaker-ap-southeast-1-178538799605/ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374/source/sourcedir.tar.gz",
"module_name": "train",
"network_interface_name": "eth0",
"num_cpus": 4,
"num_gpus": 1,
"output_data_dir": "/opt/ml/output/data",
"output_dir": "/opt/ml/output",
"output_intermediate_dir": "/opt/ml/output/intermediate",
"resource_config": {
"current_host": "algo-1",
"hosts": [
"algo-1"
],
"network_interface_name": "eth0"
},
"user_entry_point": "train.py"
}
Environment variables:
SM_HOSTS=["algo-1"]
SM_NETWORK_INTERFACE_NAME=eth0
SM_HPS={"epochs":1,"model_name":"distilbert-base-uncased","train_batch_size":32}
SM_USER_ENTRY_POINT=train.py
SM_FRAMEWORK_PARAMS={}
SM_RESOURCE_CONFIG={"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"}
SM_INPUT_DATA_CONFIG={"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}}
SM_OUTPUT_DATA_DIR=/opt/ml/output/data
SM_CHANNELS=["test","train"]
SM_CURRENT_HOST=algo-1
SM_MODULE_NAME=train
SM_LOG_LEVEL=20
SM_FRAMEWORK_MODULE=sagemaker_pytorch_container.training:main
SM_INPUT_DIR=/opt/ml/input
SM_INPUT_CONFIG_DIR=/opt/ml/input/config
SM_OUTPUT_DIR=/opt/ml/output
SM_NUM_CPUS=4
SM_NUM_GPUS=1
SM_MODEL_DIR=/opt/ml/model
SM_MODULE_DIR=s3://sagemaker-ap-southeast-1-178538799605/ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374/source/sourcedir.tar.gz
SM_TRAINING_ENV={"additional_framework_parameters":{},"channel_input_dirs":{"test":"/opt/ml/input/data/test","train":"/opt/ml/input/data/train"},"current_host":"algo-1","framework_module":"sagemaker_pytorch_container.training:main","hosts":["algo-1"],"hyperparameters":{"epochs":1,"model_name":"distilbert-base-uncased","train_batch_size":32},"input_config_dir":"/opt/ml/input/config","input_data_config":{"test":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"},"train":{"RecordWrapperType":"None","S3DistributionType":"FullyReplicated","TrainingInputMode":"File"}},"input_dir":"/opt/ml/input","is_master":true,"job_name":"ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374","log_level":20,"master_hostname":"algo-1","model_dir":"/opt/ml/model","module_dir":"s3://sagemaker-ap-southeast-1-178538799605/ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374/source/sourcedir.tar.gz","module_name":"train","network_interface_name":"eth0","num_cpus":4,"num_gpus":1,"output_data_dir":"/opt/ml/output/data","output_dir":"/opt/ml/output","output_intermediate_dir":"/opt/ml/output/intermediate","resource_config":{"current_host":"algo-1","hosts":["algo-1"],"network_interface_name":"eth0"},"user_entry_point":"train.py"}
SM_USER_ARGS=["--epochs","1","--model_name","distilbert-base-uncased","--train_batch_size","32"]
SM_OUTPUT_INTERMEDIATE_DIR=/opt/ml/output/intermediate
SM_CHANNEL_TEST=/opt/ml/input/data/test
SM_CHANNEL_TRAIN=/opt/ml/input/data/train
SM_HP_TRAIN_BATCH_SIZE=32
SM_HP_MODEL_NAME=distilbert-base-uncased
SM_HP_EPOCHS=1
PYTHONPATH=/opt/ml/code:/opt/conda/bin:/opt/conda/lib/python36.zip:/opt/conda/lib/python3.6:/opt/conda/lib/python3.6/lib-dynload:/opt/conda/lib/python3.6/site-packages
Invoking script with the following command:
/opt/conda/bin/python3.6 train.py --epochs 1 --model_name distilbert-base-uncased --train_batch_size 32
2021-09-11 09:21:12,708 - __main__ - INFO - loaded train_dataset length is: 100
2021-09-11 09:21:12,709 - __main__ - INFO - loaded test_dataset length is: 5427
2021-09-11 09:21:13,603 - filelock - INFO - Lock 140033116071920 acquired on /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333.lock
2021-09-11 09:21:14,456 - filelock - INFO - Lock 140033116071920 released on /root/.cache/huggingface/transformers/23454919702d26495337f3da04d1655c7ee010d5ec9d77bdb9e399e00302c0a1.91b885ab15d631bf9cee9dc9d25ece0afd932f2f5130eba28f2055b2220c0333.lock
2021-09-11 09:21:15,317 - filelock - INFO - Lock 140032915610704 acquired on /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a.lock
2021-09-11 09:21:20,460 - filelock - INFO - Lock 140032915610704 released on /root/.cache/huggingface/transformers/9c169103d7e5a73936dd2b627e42851bec0831212b677c637033ee4bce9ab5ee.126183e36667471617ae2f0835fab707baa54b731f991507ebbb55ea85adb12a.lock
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'classifier.weight', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
2021-09-11 09:21:22,834 - filelock - INFO - Lock 140032911509432 acquired on /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-09-11 09:21:24,522 - filelock - INFO - Lock 140032911509432 released on /root/.cache/huggingface/transformers/0e1bbfda7f63a99bb52e3915dcf10c3c92122b827d92eb2d34ce94ee79ba486c.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
2021-09-11 09:21:25,373 - filelock - INFO - Lock 140032915671416 acquired on /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-09-11 09:21:27,294 - filelock - INFO - Lock 140032915671416 released on /root/.cache/huggingface/transformers/75abb59d7a06f4f640158a9bfcde005264e59e8d566781ab1415b139d2e4c603.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
2021-09-11 09:21:29,862 - filelock - INFO - Lock 140032915672088 acquired on /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
2021-09-11 09:21:30,714 - filelock - INFO - Lock 140032915672088 released on /root/.cache/huggingface/transformers/8c8624b8ac8aa99c60c912161f8332de003484428c47906d7ff7eb7f73eecdbb.20430bd8e10ef77a7d2977accefe796051e01bc2fc4aa146bc862997a1a15e79.lock
[2021-09-11 09:21:34.048 algo-1:26 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
[2021-09-11 09:21:34.144 algo-1:26 INFO profiler_config_parser.py:102] User has disabled profiler.
[2021-09-11 09:21:34.144 algo-1:26 INFO json_config.py:91] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.
[2021-09-11 09:21:34.144 algo-1:26 INFO hook.py:201] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.
[2021-09-11 09:21:34.145 algo-1:26 INFO hook.py:255] Saving to /opt/ml/output/tensors
[2021-09-11 09:21:34.146 algo-1:26 INFO state_store.py:77] The checkpoint config file /opt/ml/input/config/checkpointconfig.json does not exist.
2021-09-11 09:21:44 Uploading - Uploading generated training model#015Downloading: 0%| | 0.00/483 [00:00<?, ?B/s]#015Downloading: 100%|██████████| 483/483 [00:00<00:00, 438kB/s]
#015Downloading: 0%| | 0.00/268M [00:00<?, ?B/s]#015Downloading: 1%|▏ | 3.49M/268M [00:00<00:07, 34.9MB/s]#015Downloading: 3%|▎ | 6.72M/268M [00:00<00:07, 33.9MB/s]#015Downloading: 5%|▌ | 13.7M/268M [00:00<00:06, 40.0MB/s]#015Downloading: 6%|▋ | 16.9M/268M [00:00<00:07, 31.5MB/s]#015Downloading: 7%|▋ | 19.8M/268M [00:00<00:08, 29.6MB/s]#015Downloading: 10%|█ | 27.9M/268M [00:00<00:06, 36.5MB/s]#015Downloading: 12%|█▏ | 33.4M/268M [00:00<00:05, 40.4MB/s]#015Downloading: 15%|█▌ | 40.8M/268M [00:00<00:04, 46.8MB/s]#015Downloading: 18%|█▊ | 49.0M/268M [00:00<00:04, 51.5MB/s]#015Downloading: 20%|██ | 54.9M/268M [00:01<00:05, 36.5MB/s]#015Downloading: 23%|██▎ | 61.8M/268M [00:01<00:04, 42.4MB/s]#015Downloading: 25%|██▌ | 67.2M/268M [00:01<00:04, 45.0MB/s]#015Downloading: 27%|██▋ | 72.6M/268M [00:01<00:04, 47.3MB/s]#015Downloading: 29%|██▉ | 78.0M/268M [00:01<00:04, 43.7MB/s]#015Downloading: 32%|███▏ | 84.9M/268M [00:01<00:03, 49.2MB/s]#015Downloading: 35%|███▍ | 93.1M/268M [00:01<00:03, 55.8MB/s]#015Downloading: 38%|███▊ | 101M/268M [00:02<00:02, 61.9MB/s] #015Downloading: 41%|████ | 110M/268M [00:02<00:02, 66.9MB/s]#015Downloading: 44%|████▍ | 118M/268M [00:02<00:02, 71.3MB/s]#015Downloading: 47%|████▋ | 126M/268M [00:02<00:02, 70.1MB/s]#015Downloading: 50%|████▉ | 133M/268M [00:02<00:02, 55.4MB/s]#015Downloading: 52%|█████▏ | 139M/268M [00:02<00:02, 52.8MB/s]#015Downloading: 54%|█████▍ | 145M/268M [00:02<00:02, 49.7MB/s]#015Downloading: 57%|█████▋ | 152M/268M [00:02<00:02, 55.1MB/s]#015Downloading: 59%|█████▉ | 159M/268M [00:02<00:01, 58.0MB/s]#015Downloading: 62%|██████▏ | 165M/268M [00:03<00:01, 59.1MB/s]#015Downloading: 64%|██████▍ | 172M/268M [00:03<00:01, 53.7MB/s]#015Downloading: 66%|██████▌ | 177M/268M [00:03<00:01, 49.6MB/s]#015Downloading: 69%|██████▉ | 185M/268M [00:03<00:01, 55.7MB/s]#015Downloading: 71%|███████▏ | 191M/268M [00:03<00:01, 44.8MB/s]#015Downloading: 74%|███████▍ | 198M/268M [00:03<00:01, 50.7MB/s]#015Downloading: 77%|███████▋ | 207M/268M [00:03<00:01, 58.4MB/s]#015Downloading: 80%|███████▉ | 214M/268M [00:04<00:01, 46.0MB/s]#015Downloading: 83%|████████▎ | 222M/268M [00:04<00:00, 52.4MB/s]#015Downloading: 86%|████████▌ | 231M/268M [00:04<00:00, 59.2MB/s]#015Downloading: 89%|████████▊ | 237M/268M [00:04<00:00, 60.5MB/s]#015Downloading: 92%|█████████▏| 246M/268M [00:04<00:00, 66.3MB/s]#015Downloading: 95%|█████████▌| 255M/268M [00:04<00:00, 71.4MB/s]#015Downloading: 98%|█████████▊| 262M/268M [00:04<00:00, 47.1MB/s]#015Downloading: 100%|██████████| 268M/268M [00:05<00:00, 53.3MB/s]
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'classifier.weight', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
#015Downloading: 0%| | 0.00/232k [00:00<?, ?B/s]#015Downloading: 13%|█▎ | 30.7k/232k [00:00<00:01, 148kB/s]#015Downloading: 39%|███▉ | 90.1k/232k [00:00<00:00, 173kB/s]#015Downloading: 88%|████████▊ | 205k/232k [00:00<00:00, 218kB/s] #015Downloading: 100%|██████████| 232k/232k [00:00<00:00, 369kB/s]
#015Downloading: 0%| | 0.00/466k [00:00<?, ?B/s]#015Downloading: 6%|▌ | 28.7k/466k [00:00<00:03, 137kB/s]#015Downloading: 20%|██ | 94.2k/466k [00:00<00:02, 164kB/s]#015Downloading: 45%|████▍ | 209k/466k [00:00<00:01, 208kB/s] #015Downloading: 94%|█████████▍| 438k/466k [00:00<00:00, 274kB/s]#015Downloading: 100%|██████████| 466k/466k [00:00<00:00, 552kB/s]
#015Downloading: 0%| | 0.00/28.0 [00:00<?, ?B/s]#015Downloading: 100%|██████████| 28.0/28.0 [00:00<00:00, 24.6kB/s]
#015 0%| | 0/4 [00:00<?, ?it/s]Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 699, in convert_to_tensors
tensor = as_tensor(value)
ValueError: expected sequence of length 1 at dim 1 (got 2)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 98, in <module>
trainer.train()
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1246, in train
for step, inputs in enumerate(epoch_iterator):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 444, in __next__
(data, worker_id) = self._next_data()
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 526, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/opt/conda/lib/python3.6/site-packages/transformers/data/data_collator.py", line 123, in __call__
return_tensors="pt",
File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 2680, in pad
return BatchEncoding(batch_outputs, tensor_type=return_tensors)
File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 204, in __init__
self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 716, in convert_to_tensors
"Unable to create tensor, you should probably activate truncation and/or padding "
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
#015 0%| | 0/4 [00:00<?, ?it/s]
2021-09-11 09:21:34,866 sagemaker-training-toolkit ERROR ExecuteUserScriptError:
Command "/opt/conda/bin/python3.6 train.py --epochs 1 --model_name distilbert-base-uncased --train_batch_size 32"
#015Downloading: 0%| | 0.00/483 [00:00<?, ?B/s]#015Downloading: 100%|ââââââââââ| 483/483 [00:00<00:00, 438kB/s]
#015Downloading: 0%| | 0.00/268M [00:00<?, ?B/s]#015Downloading: 1%|â | 3.49M/268M [00:00<00:07, 34.9MB/s]#015Downloading: 3%|â | 6.72M/268M [00:00<00:07, 33.9MB/s]#015Downloading: 5%|â | 13.7M/268M [00:00<00:06, 40.0MB/s]#015Downloading: 6%|â | 16.9M/268M [00:00<00:07, 31.5MB/s]#015Downloading: 7%|â | 19.8M/268M [00:00<00:08, 29.6MB/s]#015Downloading: 10%|â | 27.9M/268M [00:00<00:06, 36.5MB/s]#015Downloading: 12%|ââ | 33.4M/268M [00:00<00:05, 40.4MB/s]#015Downloading: 15%|ââ | 40.8M/268M [00:00<00:04, 46.8MB/s]#015Downloading: 18%|ââ | 49.0M/268M [00:00<00:04, 51.5MB/s]#015Downloading: 20%|ââ | 54.9M/268M [00:01<00:05, 36.5MB/s]#015Downloading: 23%|âââ | 61.8M/268M [00:01<00:04, 42.4MB/s]#015Downloading: 25%|âââ | 67.2M/268M [00:01<00:04, 45.0MB/s]#015Downloading: 27%|âââ | 72.6M/268M [00:01<00:04, 47.3MB/s]#015Downloading: 29%|âââ | 78.0M/268M [00:01<00:04, 43.7MB/s]#015Downloading: 32%|ââââ | 84.9M/268M [00:01<00:03, 49.2MB/s]#015Downloading: 35%|ââââ | 93.1M/268M [00:01<00:03, 55.8MB/s]#015Downloading: 38%|ââââ | 101M/268M [00:02<00:02, 61.9MB/s] #015Downloading: 41%|ââââ | 110M/268M [00:02<00:02, 66.9MB/s]#015Downloading: 44%|âââââ | 118M/268M [00:02<00:02, 71.3MB/s]#015Downloading: 47%|âââââ | 126M/268M [00:02<00:02, 70.1MB/s]#015Downloading: 50%|âââââ | 133M/268M [00:02<00:02, 55.4MB/s]#015Downloading: 52%|ââââââ | 139M/268M [00:02<00:02, 52.8MB/s]#015Downloading: 54%|ââââââ | 145M/268M [00:02<00:02, 49.7MB/s]#015Downloading: 57%|ââââââ | 152M/268M [00:02<00:02, 55.1MB/s]#015Downloading: 59%|ââââââ | 159M/268M [00:02<00:01, 58.0MB/s]#015Downloading: 62%|âââââââ | 165M/268M [00:03<00:01, 59.1MB/s]#015Downloading: 64%|âââââââ | 172M/268M [00:03<00:01, 53.7MB/s]#015Downloading: 66%|âââââââ | 177M/268M [00:03<00:01, 49.6MB/s]#015Downloading: 69%|âââââââ | 185M/268M [00:03<00:01, 55.7MB/s]#015Downloading: 71%|ââââââââ | 191M/268M [00:03<00:01, 44.8MB/s]#015Downloading: 74%|ââââââââ | 198M/268M [00:03<00:01, 50.7MB/s]#015Downloading: 77%|ââââââââ | 207M/268M [00:03<00:01, 58.4MB/s]#015Downloading: 80%|ââââââââ | 214M/268M [00:04<00:01, 46.0MB/s]#015Downloading: 83%|âââââââââ | 222M/268M [00:04<00:00, 52.4MB/s]#015Downloading: 86%|âââââââââ | 231M/268M [00:04<00:00, 59.2MB/s]#015Downloading: 89%|âââââââââ | 237M/268M [00:04<00:00, 60.5MB/s]#015Downloading: 92%|ââââââââââ| 246M/268M [00:04<00:00, 66.3MB/s]#015Downloading: 95%|ââââââââââ| 255M/268M [00:04<00:00, 71.4MB/s]#015Downloading: 98%|ââââââââââ| 262M/268M [00:04<00:00, 47.1MB/s]#015Downloading: 100%|ââââââââââ| 268M/268M [00:05<00:00, 53.3MB/s]
Some weights of the model checkpoint at distilbert-base-uncased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_projector.bias', 'vocab_transform.weight', 'vocab_transform.bias', 'vocab_projector.weight', 'vocab_layer_norm.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['pre_classifier.bias', 'classifier.bias', 'classifier.weight', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
#015Downloading: 0%| | 0.00/232k [00:00<?, ?B/s]#015Downloading: 13%|ââ | 30.7k/232k [00:00<00:01, 148kB/s]#015Downloading: 39%|ââââ | 90.1k/232k [00:00<00:00, 173kB/s]#015Downloading: 88%|âââââââââ | 205k/232k [00:00<00:00, 218kB/s] #015Downloading: 100%|ââââââââââ| 232k/232k [00:00<00:00, 369kB/s]
#015Downloading: 0%| | 0.00/466k [00:00<?, ?B/s]#015Downloading: 6%|â | 28.7k/466k [00:00<00:03, 137kB/s]#015Downloading: 20%|ââ | 94.2k/466k [00:00<00:02, 164kB/s]#015Downloading: 45%|âââââ | 209k/466k [00:00<00:01, 208kB/s] #015Downloading: 94%|ââââââââââ| 438k/466k [00:00<00:00, 274kB/s]#015Downloading: 100%|ââââââââââ| 466k/466k [00:00<00:00, 552kB/s]
#015Downloading: 0%| | 0.00/28.0 [00:00<?, ?B/s]#015Downloading: 100%|ââââââââââ| 28.0/28.0 [00:00<00:00, 24.6kB/s]
#015 0%| | 0/4 [00:00<?, ?it/s]Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 699, in convert_to_tensors
tensor = as_tensor(value)
ValueError: expected sequence of length 1 at dim 1 (got 2)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 98, in <module>
trainer.train()
File "/opt/conda/lib/python3.6/site-packages/transformers/trainer.py", line 1246, in train
for step, inputs in enumerate(epoch_iterator):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 444, in __next__
(data, worker_id) = self._next_data()
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 526, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
return self.collate_fn(data)
File "/opt/conda/lib/python3.6/site-packages/transformers/data/data_collator.py", line 123, in __call__
return_tensors="pt",
File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 2680, in pad
return BatchEncoding(batch_outputs, tensor_type=return_tensors)
File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 204, in __init__
self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
File "/opt/conda/lib/python3.6/site-packages/transformers/tokenization_utils_base.py", line 716, in convert_to_tensors
"Unable to create tensor, you should probably activate truncation and/or padding "
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.
#015 0%| | 0/4 [00:00<?, ?it/s]
2021-09-11 09:22:44 Failed - Training job failed
---------------------------------------------------------------------------
UnexpectedStatusException Traceback (most recent call last)
<ipython-input-18-b69b5ed45b37> in <module>
3 experiment_config = {
4 "ExperimentName": experiment_name,
----> 5 "TrialName" : trial.trial_name}
6 )
/opt/conda/lib/python3.6/site-packages/sagemaker/estimator.py in fit(self, inputs, wait, logs, job_name, experiment_config)
684 self.jobs.append(self.latest_training_job)
685 if wait:
--> 686 self.latest_training_job.wait(logs=logs)
687
688 def _compilation_job_name(self):
/opt/conda/lib/python3.6/site-packages/sagemaker/estimator.py in wait(self, logs)
1629 # If logs are requested, call logs_for_jobs.
1630 if logs != "None":
-> 1631 self.sagemaker_session.logs_for_job(self.job_name, wait=True, log_type=logs)
1632 else:
1633 self.sagemaker_session.wait_for_job(self.job_name)
/opt/conda/lib/python3.6/site-packages/sagemaker/session.py in logs_for_job(self, job_name, wait, poll, log_type)
3674
3675 if wait:
-> 3676 self._check_job_status(job_name, description, "TrainingJobStatus")
3677 if dot:
3678 print()
/opt/conda/lib/python3.6/site-packages/sagemaker/session.py in _check_job_status(self, job, desc, status_key_name)
3234 ),
3235 allowed_statuses=["Completed", "Stopped"],
-> 3236 actual_status=status,
3237 )
3238
UnexpectedStatusException: Error for Training job ge0908-senti-tj-2021-09-11-09-05-59-2021-09-11-09-15-45-374: Failed. Reason: AlgorithmError: ExecuteUserScriptError:
Command "/opt/conda/bin/python3.6 train.py --epochs 1 --model_name distilbert-base-uncased --train_batch_size 32"
Downloading: 0%| | 0.00/483 [00:00<?, ?B/s]
Downloading: 100%|ââââââââââ| 483/483 [00:00<00:00, 438kB/s]
Downloading: 0%| | 0.00/268M [00:00<?, ?B/s]
Downloading: 1%|â | 3.49M/268M [00:00<00:07, 34.9MB/s]
Downloading: 3%|â | 6.72M/268M [00:00<00:07, 33.9MB/s]
Downloading: 5%|â | 13.7M/268M [00:00<00:06, 40.0MB/s]
Downloading: 6%|â | 16.9M/268M [00:00<00:07, 31.5MB/s]
Downloading: 7%|â | 19.8M/268M [00:00<00:08, 29.6MB/s]
Downloading: 10%|â | 27.9M/268M [00:00<00:06, 36.5MB/s]
Downloading: 12%|ââ | 33.4M/268M [00:00<00:05, 40.4MB/s]
Downloading: 15%|ââ | 40.8M/268M [00:00<00:04, 46.8MB/s]
Downloading: 18%|ââ | 49.0M/268M [00:00<00:04, 51.5MB/s]
Downloading: 20%|ââ | 54.9M/268M [00:01<00:05, 36.5MB/s
Kindly help how I can edit the git repo accordingly to perform multiclass text classification on go_emotions.