Hello,
I have been trying to fine tune GPT-2 for Causal Language Modelling. I have a sample dataset of 320 of which 300 are used for training and 20 used for evaluation.
Once the training job completes the training metrics state that only 5 training sample were used. I am needing to use batch-size of 2 as I run into cuda memory issues otherwise. Are there any other parameters I should be looking to change to increase the sample?
Here is my code below:
hyperparameters = {
'model_name_or_path':'gpt2',
'output_dir':'/opt/ml/model',
'train_file' : 'https://dev-gptj-training.notebook.eu-west-1.sagemaker.aws/edit/input_data/raw_data/ft_input_data.txt',
'validation_file': 'https://dev-gptj-training.notebook.eu-west-1.sagemaker.aws/edit/input_data/raw_data/ft_input_data_eval.txt',
'do_train': True,
'do_eval': True,
'per_device_eval_batch_size':2,
'per_device_train_batch_size':2,
'gradient_accumulation_steps':8}
# git configuration to download our fine-tuning script
git_config = {'repo': 'https://github.com/huggingface/transformers.git','branch': 'v4.17.0'}
# creates Hugging Face estimator
huggingface_estimator = HuggingFace(
entry_point='run_clm.py',
source_dir='./examples/pytorch/language-modeling',
instance_type='ml.p3.2xlarge',
instance_count=1,
role=role,
git_config=git_config,
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
hyperparameters = hyperparameters,
output_path = output_bucket,
base_job_name = 'GPT2-v1'
)
# starting the train job
huggingface_estimator.fit(inputs={'training':'s3://1111111111111-dev-gpt2-datasets/gpt-2/datasets/ft_input_data_sunday.txt',
'test':'s3://1111111111111-dev-gpt2-datasets/gpt-2/datasets/ft_input_data_sunday_eval.txt'})
and here are the training job logs:
timestamp,message
1675636103415,"[INFO|tokenization_utils_base.py:1786] 2023-02-05 22:28:22,822 >> loading file https://huggingface.co/gpt2/resolve/main/tokenizer_config.json from cache at None"
1675636103415,"[INFO|tokenization_utils_base.py:1786] 2023-02-05 22:28:22,822 >> loading file https://huggingface.co/gpt2/resolve/main/tokenizer_config.json from cache at None"
1675636103415,"[INFO|configuration_utils.py:648] 2023-02-05 22:28:23,111 >> loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51"
1675636103415,"[INFO|configuration_utils.py:648] 2023-02-05 22:28:23,111 >> loading configuration file https://huggingface.co/gpt2/resolve/main/config.json from cache at /root/.cache/huggingface/transformers/fc674cd6907b4c9e933cb42d67662436b89fa9540a1f40d7c919d0109289ad01.7d2e0efa5ca20cef4fb199382111e9d3ad96fd77b849e1d4bed13a66e1336f51"
1675636103415,"[INFO|configuration_utils.py:684] 2023-02-05 22:28:23,112 >> Model config GPT2Config {
""_name_or_path"": ""gpt2"",
""activation_function"": ""gelu_new"",
""architectures"": [
""GPT2LMHeadModel""
],
""attn_pdrop"": 0.1,
""bos_token_id"": 50256,
""embd_pdrop"": 0.1,
""eos_token_id"": 50256,
""initializer_range"": 0.02,
""layer_norm_epsilon"": 1e-05,
""model_type"": ""gpt2"",
""n_ctx"": 1024,
""n_embd"": 768,
""n_head"": 12,
""n_inner"": null,
""n_layer"": 12,
""n_positions"": 1024,
""reorder_and_upcast_attn"": false,
""resid_pdrop"": 0.1,
""scale_attn_by_inverse_layer_idx"": false,
""scale_attn_weights"": true,
""summary_activation"": null,
""summary_first_dropout"": 0.1,
""summary_proj_to_labels"": true,
""summary_type"": ""cls_index"",
""summary_use_proj"": true,
""task_specific_params"": {
""text-generation"": {
""do_sample"": true,
""max_length"": 50
}
},
""transformers_version"": ""4.17.0"",
""use_cache"": true,
""vocab_size"": 50257"
1675636103415,}
1675636103416,"[INFO|configuration_utils.py:684] 2023-02-05 22:28:23,112 >> Model config GPT2Config {
""_name_or_path"": ""gpt2"",
""activation_function"": ""gelu_new"",
""architectures"": [
""GPT2LMHeadModel""
],
""attn_pdrop"": 0.1,
""bos_token_id"": 50256,
""embd_pdrop"": 0.1,
""eos_token_id"": 50256,
""initializer_range"": 0.02,
""layer_norm_epsilon"": 1e-05,
""model_type"": ""gpt2"",
""n_ctx"": 1024,
""n_embd"": 768,
""n_head"": 12,
""n_inner"": null,
""n_layer"": 12,
""n_positions"": 1024,
""reorder_and_upcast_attn"": false,
""resid_pdrop"": 0.1,
""scale_attn_by_inverse_layer_idx"": false,
""scale_attn_weights"": true,
""summary_activation"": null,
""summary_first_dropout"": 0.1,
""summary_proj_to_labels"": true,
""summary_type"": ""cls_index"",
""summary_use_proj"": true,
""task_specific_params"": {
""text-generation"": {
""do_sample"": true,
""max_length"": 50
}
},
""transformers_version"": ""4.17.0"",
""use_cache"": true,
""vocab_size"": 50257"
1675636103416,}
1675636104416,"[INFO|file_utils.py:2215] 2023-02-05 22:28:23,492 >> https://huggingface.co/gpt2/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpixj9yloj"
1675636104416,"[INFO|file_utils.py:2215] 2023-02-05 22:28:23,492 >> https://huggingface.co/gpt2/resolve/main/pytorch_model.bin not found in cache or force_download set to True, downloading to /root/.cache/huggingface/transformers/tmpixj9yloj"
1675636104416,"Downloading: 0%| | 0.00/523M [00:00<?, ?B/s]"
1675636104416,"Downloading: 1%| | 5.59M/523M [00:00<00:09, 58.6MB/s]"
1675636104416,"Downloading: 2%|β | 11.2M/523M [00:00<00:09, 56.3MB/s]"
1675636104416,"Downloading: 3%|β | 16.6M/523M [00:00<00:09, 56.1MB/s]"
1675636104417,"Downloading: 4%|β | 22.0M/523M [00:00<00:09, 56.5MB/s]"
1675636104417,"Downloading: 5%|β | 27.5M/523M [00:00<00:09, 56.7MB/s]"
1675636104417,"Downloading: 6%|β | 32.9M/523M [00:00<00:09, 56.9MB/s]"
1675636104417,"Downloading: 7%|β | 38.5M/523M [00:00<00:08, 57.5MB/s]"
1675636104417,"Downloading: 8%|β | 44.0M/523M [00:00<00:08, 56.6MB/s]"
1675636105417,"Downloading: 10%|β | 49.7M/523M [00:00<00:08, 57.7MB/s]"
1675636105417,"Downloading: 11%|β | 55.3M/523M [00:01<00:08, 57.0MB/s]"
1675636105417,"Downloading: 12%|ββ | 60.7M/523M [00:01<00:08, 56.4MB/s]"
1675636105417,"Downloading: 13%|ββ | 66.1M/523M [00:01<00:08, 53.3MB/s]"
1675636105417,"Downloading: 14%|ββ | 71.2M/523M [00:01<00:11, 40.3MB/s]"
1675636105417,"Downloading: 14%|ββ | 75.5M/523M [00:01<00:11, 39.1MB/s]"
1675636105417,"Downloading: 15%|ββ | 79.8M/523M [00:01<00:11, 40.3MB/s]"
1675636105417,"Downloading: 16%|ββ | 83.8M/523M [00:01<00:12, 36.3MB/s]"
1675636106418,"Downloading: 17%|ββ | 87.5M/523M [00:01<00:12, 36.4MB/s]"
1675636106418,"Downloading: 17%|ββ | 91.1M/523M [00:02<00:12, 36.8MB/s]"
1675636106418,"Downloading: 18%|ββ | 95.9M/523M [00:02<00:11, 40.4MB/s]"
1675636106418,"Downloading: 19%|ββ | 99.9M/523M [00:02<00:12, 36.5MB/s]"
1675636106418,"Downloading: 20%|ββ | 106M/523M [00:02<00:09, 44.4MB/s]"
1675636106418,"Downloading: 22%|βββ | 113M/523M [00:02<00:08, 50.7MB/s]"
1675636106418,"Downloading: 23%|βββ | 120M/523M [00:02<00:07, 56.5MB/s]"
1675636106418,"Downloading: 24%|βββ | 125M/523M [00:02<00:08, 51.1MB/s]"
1675636106418,"Downloading: 25%|βββ | 130M/523M [00:02<00:08, 51.2MB/s]"
1675636107418,"Downloading: 26%|βββ | 135M/523M [00:02<00:07, 51.6MB/s]"
1675636107418,"Downloading: 27%|βββ | 141M/523M [00:03<00:07, 55.2MB/s]"
1675636107419,"Downloading: 28%|βββ | 148M/523M [00:03<00:06, 59.3MB/s]"
1675636107419,"Downloading: 29%|βββ | 154M/523M [00:03<00:06, 60.0MB/s]"
1675636107419,"Downloading: 31%|βββ | 161M/523M [00:03<00:05, 63.7MB/s]"
1675636107419,"Downloading: 32%|ββββ | 168M/523M [00:03<00:05, 65.8MB/s]"
1675636107419,"Downloading: 33%|ββββ | 174M/523M [00:03<00:05, 63.8MB/s]"
1675636107419,"Downloading: 34%|ββββ | 180M/523M [00:03<00:05, 63.9MB/s]"
1675636107419,"Downloading: 36%|ββββ | 187M/523M [00:03<00:05, 65.9MB/s]"
1675636107419,"Downloading: 37%|ββββ | 193M/523M [00:03<00:05, 62.7MB/s]"
1675636108419,"Downloading: 38%|ββββ | 199M/523M [00:03<00:05, 62.6MB/s]"
1675636108419,"Downloading: 39%|ββββ | 206M/523M [00:04<00:05, 64.0MB/s]"
1675636108419,"Downloading: 41%|ββββ | 212M/523M [00:04<00:06, 50.6MB/s]"
1675636108419,"Downloading: 42%|βββββ | 217M/523M [00:04<00:06, 50.0MB/s]"
1675636108419,"Downloading: 43%|βββββ | 223M/523M [00:04<00:05, 53.3MB/s]"
1675636108419,"Downloading: 44%|βββββ | 229M/523M [00:04<00:05, 54.9MB/s]"
1675636108419,"Downloading: 45%|βββββ | 236M/523M [00:04<00:05, 59.5MB/s]"
1675636108419,"Downloading: 46%|βββββ | 242M/523M [00:04<00:04, 60.6MB/s]"
1675636108419,"Downloading: 47%|βββββ | 248M/523M [00:04<00:04, 61.1MB/s]"
1675636109420,"Downloading: 48%|βββββ | 254M/523M [00:04<00:04, 60.7MB/s]"
1675636109420,"Downloading: 50%|βββββ | 259M/523M [00:05<00:04, 56.5MB/s]"
1675636109420,"Downloading: 51%|βββββ | 266M/523M [00:05<00:04, 59.5MB/s]"
1675636109420,"Downloading: 52%|ββββββ | 272M/523M [00:05<00:04, 62.2MB/s]"
1675636109420,"Downloading: 53%|ββββββ | 278M/523M [00:05<00:04, 62.7MB/s]"
1675636109420,"Downloading: 55%|ββββββ | 285M/523M [00:05<00:03, 64.3MB/s]"
1675636109420,"Downloading: 56%|ββββββ | 292M/523M [00:05<00:03, 65.9MB/s]"
1675636109420,"Downloading: 57%|ββββββ | 298M/523M [00:05<00:03, 65.9MB/s]"
1675636109420,"Downloading: 58%|ββββββ | 304M/523M [00:05<00:03, 65.0MB/s]"
1675636110421,"Downloading: 59%|ββββββ | 310M/523M [00:05<00:03, 61.3MB/s]"
1675636110421,"Downloading: 61%|ββββββ | 316M/523M [00:06<00:03, 57.6MB/s]"
1675636110421,"Downloading: 62%|βββββββ | 322M/523M [00:06<00:03, 57.9MB/s]"
1675636110421,"Downloading: 63%|βββββββ | 328M/523M [00:06<00:04, 48.1MB/s]"
1675636110421,"Downloading: 64%|βββββββ | 334M/523M [00:06<00:03, 53.2MB/s]"
1675636110421,"Downloading: 65%|βββββββ | 341M/523M [00:06<00:03, 58.2MB/s]"
1675636110421,"Downloading: 66%|βββββββ | 347M/523M [00:06<00:03, 51.3MB/s]"
1675636110421,"Downloading: 68%|βββββββ | 354M/523M [00:06<00:03, 56.5MB/s]"
1675636110421,"Downloading: 69%|βββββββ | 360M/523M [00:06<00:02, 59.9MB/s]"
1675636111421,"Downloading: 70%|βββββββ | 367M/523M [00:06<00:02, 61.7MB/s]"
1675636111422,"Downloading: 71%|ββββββββ | 373M/523M [00:07<00:02, 62.6MB/s]"
1675636111422,"Downloading: 73%|ββββββββ | 379M/523M [00:07<00:02, 64.5MB/s]"
1675636111422,"Downloading: 74%|ββββββββ | 386M/523M [00:07<00:02, 64.7MB/s]"
1675636111422,"Downloading: 75%|ββββββββ | 392M/523M [00:07<00:02, 64.4MB/s]"
1675636111422,"Downloading: 76%|ββββββββ | 398M/523M [00:07<00:02, 65.2MB/s]"
1675636111422,"Downloading: 77%|ββββββββ | 405M/523M [00:07<00:01, 64.0MB/s]"
1675636111422,"Downloading: 79%|ββββββββ | 411M/523M [00:07<00:01, 60.4MB/s]"
1675636111422,"Downloading: 80%|ββββββββ | 417M/523M [00:07<00:01, 62.8MB/s]"
1675636112422,"Downloading: 81%|ββββββββ | 424M/523M [00:07<00:01, 63.8MB/s]"
1675636112422,"Downloading: 82%|βββββββββ | 430M/523M [00:08<00:01, 63.6MB/s]"
1675636112422,"Downloading: 83%|βββββββββ | 436M/523M [00:08<00:01, 60.5MB/s]"
1675636112422,"Downloading: 85%|βββββββββ | 443M/523M [00:08<00:01, 64.8MB/s]"
1675636112422,"Downloading: 86%|βββββββββ | 449M/523M [00:08<00:01, 64.9MB/s]"
1675636112422,"Downloading: 87%|βββββββββ | 455M/523M [00:08<00:01, 56.8MB/s]"
1675636112422,"Downloading: 88%|βββββββββ | 461M/523M [00:08<00:01, 58.2MB/s]"
1675636112422,"Downloading: 89%|βββββββββ | 467M/523M [00:08<00:01, 49.9MB/s]"
1675636112422,"Downloading: 91%|βββββββββ | 473M/523M [00:08<00:00, 53.5MB/s]"
1675636113423,"Downloading: 92%|ββββββββββ| 479M/523M [00:08<00:00, 57.0MB/s]"
1675636113423,"Downloading: 93%|ββββββββββ| 486M/523M [00:09<00:00, 61.3MB/s]"
1675636113423,"Downloading: 94%|ββββββββββ| 493M/523M [00:09<00:00, 63.5MB/s]"
1675636113423,"Downloading: 96%|ββββββββββ| 500M/523M [00:09<00:00, 65.2MB/s]"
1675636113423,"Downloading: 97%|ββββββββββ| 506M/523M [00:09<00:00, 66.7MB/s]"
1675636113423,"Downloading: 98%|ββββββββββ| 514M/523M [00:09<00:00, 69.7MB/s]"
1675636113423,"Downloading: 100%|ββββββββββ| 521M/523M [00:09<00:00, 70.7MB/s]"
1675636113423,"Downloading: 100%|ββββββββββ| 523M/523M [00:09<00:00, 57.3MB/s]"
1675636113423,"[INFO|file_utils.py:2219] 2023-02-05 22:28:33,090 >> storing https://huggingface.co/gpt2/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/752929ace039baa8ef70fe21cdf9ab9445773d20e733cf693d667982e210837e.323c769945a351daa25546176f8208b3004b6f563438a7603e7932bae9025925"
1675636113423,"[INFO|file_utils.py:2219] 2023-02-05 22:28:33,090 >> storing https://huggingface.co/gpt2/resolve/main/pytorch_model.bin in cache at /root/.cache/huggingface/transformers/752929ace039baa8ef70fe21cdf9ab9445773d20e733cf693d667982e210837e.323c769945a351daa25546176f8208b3004b6f563438a7603e7932bae9025925"
1675636113423,"[INFO|file_utils.py:2227] 2023-02-05 22:28:33,091 >> creating metadata file for /root/.cache/huggingface/transformers/752929ace039baa8ef70fe21cdf9ab9445773d20e733cf693d667982e210837e.323c769945a351daa25546176f8208b3004b6f563438a7603e7932bae9025925"
1675636113423,"[INFO|file_utils.py:2227] 2023-02-05 22:28:33,091 >> creating metadata file for /root/.cache/huggingface/transformers/752929ace039baa8ef70fe21cdf9ab9445773d20e733cf693d667982e210837e.323c769945a351daa25546176f8208b3004b6f563438a7603e7932bae9025925"
1675636113423,"[INFO|modeling_utils.py:1431] 2023-02-05 22:28:33,091 >> loading weights file https://huggingface.co/gpt2/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/752929ace039baa8ef70fe21cdf9ab9445773d20e733cf693d667982e210837e.323c769945a351daa25546176f8208b3004b6f563438a7603e7932bae9025925"
1675636113423,"[INFO|modeling_utils.py:1431] 2023-02-05 22:28:33,091 >> loading weights file https://huggingface.co/gpt2/resolve/main/pytorch_model.bin from cache at /root/.cache/huggingface/transformers/752929ace039baa8ef70fe21cdf9ab9445773d20e733cf693d667982e210837e.323c769945a351daa25546176f8208b3004b6f563438a7603e7932bae9025925"
1675636115424,"[INFO|modeling_utils.py:1702] 2023-02-05 22:28:35,010 >> All model checkpoint weights were used when initializing GPT2LMHeadModel."
1675636115424,"[INFO|modeling_utils.py:1710] 2023-02-05 22:28:35,010 >> All the weights of GPT2LMHeadModel were initialized from the model checkpoint at gpt2."
1675636115424,"If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training."
1675636115424,"[INFO|modeling_utils.py:1702] 2023-02-05 22:28:35,010 >> All model checkpoint weights were used when initializing GPT2LMHeadModel."
1675636115424,"[INFO|modeling_utils.py:1710] 2023-02-05 22:28:35,010 >> All the weights of GPT2LMHeadModel were initialized from the model checkpoint at gpt2."
1675636115424,"If your task is similar to the task the model of the checkpoint was trained on, you can already use GPT2LMHeadModel for predictions without further training."
1675636115424,"02/05/2023 22:28:35 - WARNING - datasets.fingerprint - Parameter 'function'=<function main.<locals>.tokenize_function at 0x7f1e0ad49ee0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed."
1675636115425,"Running tokenizer on dataset: 0%| | 0/1 [00:00<?, ?ba/s]"
1675636115425,"[WARNING|tokenization_utils_base.py:3397] 2023-02-05 22:28:35,087 >> Token indices sequence length is longer than the specified maximum sequence length for this model (1226 > 1024). Running this sequence through the model will result in indexing errors"
1675636115425,"[WARNING|tokenization_utils_base.py:3397] 2023-02-05 22:28:35,087 >> Token indices sequence length is longer than the specified maximum sequence length for this model (1226 > 1024). Running this sequence through the model will result in indexing errors"
1675636115425,"[WARNING|run_clm.py:378] 2023-02-05 22:28:35,087 >> ^^^^^^^^^^^^^^^^ Please ignore the warning above - this long input will be chunked into smaller bits before being passed to the model."
1675636115425,"[WARNING|run_clm.py:378] 2023-02-05 22:28:35,087 >> ^^^^^^^^^^^^^^^^ Please ignore the warning above - this long input will be chunked into smaller bits before being passed to the model."
1675636115425,02/05/2023 22:28:35 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/text/default-5872c4bdb0144370/0.0.0/08f6fb1dd2dab0a18ea441c359e1d63794ea8cb53e7863e6edf8fc5655e47ec4/cache-1c80317fa3b1799d.arrow
1675636115425,"Running tokenizer on dataset: 100%|ββββββββββ| 1/1 [00:00<00:00, 13.01ba/s]"
1675636115425,"02/05/2023 22:28:35 - INFO - datasets.fingerprint - Parameter 'function'=<function main.<locals>.tokenize_function at 0x7f1e0ad3ea60> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead."
1675636115425,"Running tokenizer on dataset: 0%| | 0/1 [00:00<?, ?ba/s]"
1675636115425,02/05/2023 22:28:35 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/text/default-5872c4bdb0144370/0.0.0/08f6fb1dd2dab0a18ea441c359e1d63794ea8cb53e7863e6edf8fc5655e47ec4/cache-bdd640fb06671ad1.arrow
1675636115425,"Running tokenizer on dataset: 100%|ββββββββββ| 1/1 [00:00<00:00, 106.55ba/s]"
1675636115425,"02/05/2023 22:28:35 - INFO - datasets.fingerprint - Parameter 'function'=<function main.<locals>.group_texts at 0x7f1e0ad49ee0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead."
1675636115425,"Grouping texts in chunks of 1024: 0%| | 0/1 [00:00<?, ?ba/s]"
1675636115425,02/05/2023 22:28:35 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/text/default-5872c4bdb0144370/0.0.0/08f6fb1dd2dab0a18ea441c359e1d63794ea8cb53e7863e6edf8fc5655e47ec4/cache-3eb13b9046685257.arrow
1675636115425,"Grouping texts in chunks of 1024: 100%|ββββββββββ| 1/1 [00:00<00:00, 98.70ba/s]"
1675636115425,"02/05/2023 22:28:35 - INFO - datasets.fingerprint - Parameter 'function'=<function main.<locals>.group_texts at 0x7f1e0ad49ee0> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead."
1675636115425,"Grouping texts in chunks of 1024: 0%| | 0/1 [00:00<?, ?ba/s]"
1675636115425,02/05/2023 22:28:35 - INFO - datasets.arrow_dataset - Caching processed dataset at /root/.cache/huggingface/datasets/text/default-5872c4bdb0144370/0.0.0/08f6fb1dd2dab0a18ea441c359e1d63794ea8cb53e7863e6edf8fc5655e47ec4/cache-23b8c1e9392456de.arrow
1675636115425,"Grouping texts in chunks of 1024: 100%|ββββββββββ| 1/1 [00:00<00:00, 100.88ba/s]"
1675636116425,"02/05/2023 22:28:35 - INFO - datasets.utils.file_utils - https://raw.githubusercontent.com/huggingface/datasets/1.18.4/metrics/accuracy/accuracy.py not found in cache or force_download set to True, downloading to /root/.cache/huggingface/datasets/downloads/tmpat4_lqt5"
1675636116425,"Downloading: 0%| | 0.00/1.41k [00:00<?, ?B/s]"
1675636116425,"Downloading: 3.19kB [00:00, 2.15MB/s]"
1675636116426,02/05/2023 22:28:35 - INFO - datasets.utils.file_utils - storing https://raw.githubusercontent.com/huggingface/datasets/1.18.4/metrics/accuracy/accuracy.py in cache at /root/.cache/huggingface/datasets/downloads/18ec2a1ed9dbcfd6ecff70a4f0d0d33fd5cc40c51c3c816376dc3d0b3e30219f.6913c0dc30de3cef9d6bc88cc182661800cb937f0fe5b01ffa731617105a32ac.py
1675636116426,02/05/2023 22:28:35 - INFO - datasets.utils.file_utils - creating metadata file for /root/.cache/huggingface/datasets/downloads/18ec2a1ed9dbcfd6ecff70a4f0d0d33fd5cc40c51c3c816376dc3d0b3e30219f.6913c0dc30de3cef9d6bc88cc182661800cb937f0fe5b01ffa731617105a32ac.py
1675636120427,"/opt/conda/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn("
1675636120427,"[INFO|trainer.py:1279] 2023-02-05 22:28:40,375 >> ***** Running training *****"
1675636120427,"[INFO|trainer.py:1279] 2023-02-05 22:28:40,375 >> ***** Running training *****"
1675636120427,"[INFO|trainer.py:1280] 2023-02-05 22:28:40,375 >> Num examples = 5"
1675636120427,"[INFO|trainer.py:1281] 2023-02-05 22:28:40,375 >> Num Epochs = 3"
1675636120427,"[INFO|trainer.py:1282] 2023-02-05 22:28:40,375 >> Instantaneous batch size per device = 2"
1675636120427,"[INFO|trainer.py:1280] 2023-02-05 22:28:40,375 >> Num examples = 5"
1675636120427,"[INFO|trainer.py:1281] 2023-02-05 22:28:40,375 >> Num Epochs = 3"
1675636120427,"[INFO|trainer.py:1282] 2023-02-05 22:28:40,375 >> Instantaneous batch size per device = 2"
1675636120427,"[INFO|trainer.py:1283] 2023-02-05 22:28:40,375 >> Total train batch size (w. parallel, distributed & accumulation) = 16"
1675636120427,"[INFO|trainer.py:1284] 2023-02-05 22:28:40,375 >> Gradient Accumulation steps = 8"
1675636120427,"[INFO|trainer.py:1285] 2023-02-05 22:28:40,375 >> Total optimization steps = 3"
1675636120427,"[INFO|trainer.py:1283] 2023-02-05 22:28:40,375 >> Total train batch size (w. parallel, distributed & accumulation) = 16"
1675636120427,"[INFO|trainer.py:1284] 2023-02-05 22:28:40,375 >> Gradient Accumulation steps = 8"
1675636120427,"[INFO|trainer.py:1285] 2023-02-05 22:28:40,375 >> Total optimization steps = 3"
1675636120427,"0%| | 0/3 [00:00<?, ?it/s]"
1675636121428,[2023-02-05 22:28:40.545 algo-1:49 INFO utils.py:27] RULE_JOB_STOP_SIGNAL_FILENAME: None
1675636121428,[2023-02-05 22:28:40.716 algo-1:49 INFO profiler_config_parser.py:111] User has disabled profiler.
1675636121428,[2023-02-05 22:28:40.717 algo-1:49 INFO json_config.py:91] Creating hook from json_config at /opt/ml/input/config/debughookconfig.json.
1675636121428,[2023-02-05 22:28:40.718 algo-1:49 INFO hook.py:201] tensorboard_dir has not been set for the hook. SMDebug will not be exporting tensorboard summaries.
1675636121428,[2023-02-05 22:28:40.718 algo-1:49 INFO hook.py:254] Saving to /opt/ml/output/tensors
1675636121428,[2023-02-05 22:28:40.718 algo-1:49 INFO state_store.py:77] The checkpoint config file /opt/ml/input/config/checkpointconfig.json does not exist.
1675636122428,"33%|ββββ | 1/3 [00:01<00:03, 1.56s/it]"
1675636123429,"67%|βββββββ | 2/3 [00:02<00:00, 1.03it/s]"
1675636123429,"100%|ββββββββββ| 3/3 [00:02<00:00, 1.28it/s]"
1675636123429,"[INFO|trainer.py:1508] 2023-02-05 22:28:43,047 >> "
1675636123429,Training completed. Do not forget to share your model on huggingface.co/models =)
1675636123429,"[INFO|trainer.py:1508] 2023-02-05 22:28:43,047 >> "
1675636123429,Training completed. Do not forget to share your model on huggingface.co/models =)
1675636123429,"{'train_runtime': 2.6716, 'train_samples_per_second': 5.615, 'train_steps_per_second': 1.123, 'train_loss': 1.36589781443278, 'epoch': 3.0}"
1675636123429,"100%|ββββββββββ| 3/3 [00:02<00:00, 1.28it/s]"
1675636123429,"100%|ββββββββββ| 3/3 [00:02<00:00, 1.12it/s]"
1675636123429,"[INFO|trainer.py:2139] 2023-02-05 22:28:43,048 >> Saving model checkpoint to /opt/ml/model"
1675636123429,"[INFO|trainer.py:2139] 2023-02-05 22:28:43,048 >> Saving model checkpoint to /opt/ml/model"
1675636123429,"[INFO|configuration_utils.py:439] 2023-02-05 22:28:43,049 >> Configuration saved in /opt/ml/model/config.json"
1675636123429,"[INFO|configuration_utils.py:439] 2023-02-05 22:28:43,049 >> Configuration saved in /opt/ml/model/config.json"
1675636124429,"[INFO|modeling_utils.py:1084] 2023-02-05 22:28:43,964 >> Model weights saved in /opt/ml/model/pytorch_model.bin"
1675636124430,"[INFO|modeling_utils.py:1084] 2023-02-05 22:28:43,964 >> Model weights saved in /opt/ml/model/pytorch_model.bin"
1675636124430,"[INFO|tokenization_utils_base.py:2094] 2023-02-05 22:28:43,965 >> tokenizer config file saved in /opt/ml/model/tokenizer_config.json"
1675636124430,"[INFO|tokenization_utils_base.py:2094] 2023-02-05 22:28:43,965 >> tokenizer config file saved in /opt/ml/model/tokenizer_config.json"
1675636124430,"[INFO|tokenization_utils_base.py:2100] 2023-02-05 22:28:43,965 >> Special tokens file saved in /opt/ml/model/special_tokens_map.json"
1675636124430,"[INFO|tokenization_utils_base.py:2100] 2023-02-05 22:28:43,965 >> Special tokens file saved in /opt/ml/model/special_tokens_map.json"
1675636124430,***** train metrics *****
1675636124430,"epoch = 3.0
train_loss = 1.3659
train_runtime = 0:00:02.67
train_samples = 5
train_samples_per_second = 5.615
train_steps_per_second = 1.123"
1675636124430,02/05/2023 22:28:44 - INFO - __main__ - *** Evaluate ***
1675636124430,"[INFO|trainer.py:2389] 2023-02-05 22:28:44,077 >> ***** Running Evaluation *****"
1675636124430,"[INFO|trainer.py:2389] 2023-02-05 22:28:44,077 >> ***** Running Evaluation *****"
1675636124430,"[INFO|trainer.py:2391] 2023-02-05 22:28:44,077 >> Num examples = 5"
1675636124430,"[INFO|trainer.py:2394] 2023-02-05 22:28:44,077 >> Batch size = 2"
1675636124430,"[INFO|trainer.py:2391] 2023-02-05 22:28:44,077 >> Num examples = 5"
1675636124430,"[INFO|trainer.py:2394] 2023-02-05 22:28:44,077 >> Batch size = 2"
1675636124430,"0%| | 0/3 [00:00<?, ?it/s]"
1675636124430,"100%|ββββββββββ| 3/3 [00:00<00:00, 26.31it/s]"
1675636124430,02/05/2023 22:28:44 - INFO - datasets.metric - Removing /root/.cache/huggingface/metrics/accuracy/default/default_experiment-1-0.arrow
1675636124430,"100%|ββββββββββ| 3/3 [00:00<00:00, 20.81it/s]"
1675636124430,***** eval metrics *****
1675636124430,"epoch = 3.0
eval_accuracy = 0.6016
eval_loss = 2.4612
eval_runtime = 0:00:00.21
eval_samples = 5
eval_samples_per_second = 22.761
eval_steps_per_second = 13.657
perplexity = 11.7194"
1675636125431,"[INFO|modelcard.py:460] 2023-02-05 22:28:44,666 >> Dropping the following result as it does not have all the necessary fields:"
1675636125431,"{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.601564027370479}]}"
1675636125431,"[INFO|modelcard.py:460] 2023-02-05 22:28:44,666 >> Dropping the following result as it does not have all the necessary fields:"
1675636125431,"{'task': {'name': 'Causal Language Modeling', 'type': 'text-generation'}, 'metrics': [{'name': 'Accuracy', 'type': 'accuracy', 'value': 0.601564027370479}]}"
1675636125431,"2023-02-05 22:28:45,239 sagemaker-training-toolkit INFO Waiting for the process to finish and give a return code."
1675636125431,"2023-02-05 22:28:45,239 sagemaker-training-toolkit INFO Done waiting for a return code. Received 0 from exiting process."
1675636125431,"2023-02-05 22:28:45,240 sagemaker-training-toolkit INFO Reporting training SUCCESS"