Hi everyone,
I spent 2 days and 30 training job runs trying to deploy the LayoutLMV3 to Sagemaker. So far - very little success: this is I why I am coming to the community for help
Can you please help me in finding the cause of the issue? Here is an error which I get right before the training job fails:
KeyError: 'layoutlmv3'"
Here is a wider context around that error:
model_args.model_name_or_path: microsoft/layoutlmv3-large
model_args.config_name: None
[INFO|file_utils.py:2215] 2022-12-16 14:11:36,731 >> https://huggingface.co/microsoft/layoutlmv3-large/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpjakyi4na
[INFO|file_utils.py:2215] 2022-12-16 14:11:36,731 >> https://huggingface.co/microsoft/layoutlmv3-large/resolve/main/config.json not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpjakyi4na
Downloading: 0%| | 0.00/857 [00:00<?, ?B/s]
Downloading: 100%|██████████| 857/857 [00:00<00:00, 872kB/s]
[INFO|file_utils.py:2219] 2022-12-16 14:11:37,111 >> storing https://huggingface.co/microsoft/layoutlmv3-large/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/8a0e5de726f43163aedf8dc08186e7ef3b2c706adc3a01a024a6427d09e4e3f0.9009c531534232ef27cf370ef50d8628b965e90eb385fd924a3a02fd9af07213
[INFO|file_utils.py:2219] 2022-12-16 14:11:37,111 >> storing https://huggingface.co/microsoft/layoutlmv3-large/resolve/main/config.json in cache at /root/.cache/huggingface/transformers/8a0e5de726f43163aedf8dc08186e7ef3b2c706adc3a01a024a6427d09e4e3f0.9009c531534232ef27cf370ef50d8628b965e90eb385fd924a3a02fd9af07213
[INFO|file_utils.py:2227] 2022-12-16 14:11:37,111 >> creating metadata file for /root/.cache/huggingface/transformers/8a0e5de726f43163aedf8dc08186e7ef3b2c706adc3a01a024a6427d09e4e3f0.9009c531534232ef27cf370ef50d8628b965e90eb385fd924a3a02fd9af07213
[INFO|file_utils.py:2227] 2022-12-16 14:11:37,111 >> creating metadata file for /root/.cache/huggingface/transformers/8a0e5de726f43163aedf8dc08186e7ef3b2c706adc3a01a024a6427d09e4e3f0.9009c531534232ef27cf370ef50d8628b965e90eb385fd924a3a02fd9af07213
[INFO|configuration_utils.py:648] 2022-12-16 14:11:37,111 >> loading configuration file https://huggingface.co/microsoft/layoutlmv3-large/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/8a0e5de726f43163aedf8dc08186e7ef3b2c706adc3a01a024a6427d09e4e3f0.9009c531534232ef27cf370ef50d8628b965e90eb385fd924a3a02fd9af07213
[INFO|configuration_utils.py:648] 2022-12-16 14:11:37,111 >> loading configuration file https://huggingface.co/microsoft/layoutlmv3-large/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/8a0e5de726f43163aedf8dc08186e7ef3b2c706adc3a01a024a6427d09e4e3f0.9009c531534232ef27cf370ef50d8628b965e90eb385fd924a3a02fd9af07213
Traceback (most recent call last):
File "run_ner.py", line 631, in <module>
main()
File "run_ner.py", line 346, in main
config = AutoConfig.from_pretrained(
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 657, in from_pretrained
config_class = CONFIG_MAPPING[config_dict["model_type"]]
File "/opt/conda/lib/python3.8/site-packages/transformers/models/auto/configuration_auto.py", line 372, in __getitem__
raise KeyError(key)
KeyError: 'layoutlmv3'
Here is the code which launches training job:
import sagemaker
from sagemaker.huggingface import HuggingFace
import botocore
from datasets.filesystems import S3FileSystem
sess = sagemaker.Session()
role = sagemaker.get_execution_role()
role_name = role.split('/')[-1]
sagemaker_session_bucket = 'public-layoutlm3-training-data'
git_config = {'repo': 'https://github.com/pavel-nesterov/diploma-transformers-for-layoutLM-with-load_from_disk.git','branch': 'pavel-save-to-disk'}
instance = "ml.g4dn.xlarge"
training_input_path = f's3://{sagemaker_session_bucket}'
test_input_path = f's3://{sagemaker_session_bucket}'
huggingface_estimator = HuggingFace(
entry_point='run_ner.py',
source_dir='./examples/pytorch/token-classification',
instance_type=instance,
instance_count=1,
role=role,
git_config=git_config,
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
#use_spot_instances=True,
#max_wait=60,
#max_run=1200,
hyperparameters = {'model_name_or_path':'microsoft/layoutlmv3-large',
'output_dir':'/opt/ml/model',
'train_file': '/opt/ml/input/data/train/train_split.json',
'validation_file': '/opt/ml/input/data/test/eval_split.json',
'do_train': True,
}
)
huggingface_estimator.fit({'train': training_input_path, 'test': test_input_path})