Below are logs during the finetune model.
after some time got the error, please look at the log and provide appropriate solution for this.
Thanks
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 50%|βββββ | 1/2 [00:03<00:03, 3.15s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.62s/it]
Loading checkpoint shards: 100%|ββββββββββ| 2/2 [00:03<00:00, 1.85s/it]
/app/env/lib/python3.9/site-packages/transformers/utils/hub.py:374: FutureWarning: The use_auth_token
argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
Downloading (β¦)neration_config.json: 0%| | 0.00/188 [00:00<?, ?B/s]
Downloading (β¦)neration_config.json: 100%|ββββββββββ| 188/188 [00:00<00:00, 87.2kB/s]
INFO Using block size 1024
INFO creating trainer
0%| | 0/228 [00:00<?, ?it/s]Youβre using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the __call__
method is faster than using a method to encode the text followed by a call to the pad
method to get a padded encoding.
/app/env/lib/python3.9/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
0%| | 1/228 [00:04<16:11, 4.28s/it]
1%| | 2/228 [00:07<13:34, 3.60s/it]
1%|β | 3/228 [00:10<12:42, 3.39s/it]
2%|β | 4/228 [00:13<12:16, 3.29s/it]
2%|β | 5/228 [00:16<12:00, 3.23s/it]/app/env/lib/python3.9/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
3%|β | 6/228 [00:20<12:04, 3.26s/it]
3%|β | 7/228 [00:23<11:51, 3.22s/it]
4%|β | 8/228 [00:26<11:41, 3.19s/it]
4%|β | 9/228 [00:29<11:34, 3.17s/it]
4%|β | 10/228 [00:32<11:28, 3.16s/it]/app/env/lib/python3.9/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
{βlossβ: 1.4006, βlearning_rateβ: 0.00013043478260869567, βepochβ: 2.07}
{βtrain_runtimeβ: 48.6969, βtrain_samples_per_secondβ: 9.364, βtrain_steps_per_secondβ: 4.682, βtrain_lossβ: 1.4006022135416667, βepochβ: 2.07}
5%|β | 11/228 [00:35<11:36, 3.21s/it]
5%|β | 12/228 [00:39<11:28, 3.19s/it]
6%|β | 13/228 [00:42<11:21, 3.17s/it]
6%|β | 14/228 [00:45<11:16, 3.16s/it]
7%|β | 15/228 [00:48<11:11, 3.15s/it]
7%|β | 15/228 [00:48<11:11, 3.15s/it]
7%|β | 15/228 [00:48<11:11, 3.15s/it]
7%|β | 15/228 [00:48<11:31, 3.25s/it]
INFO Finished training, saving modelβ¦
INFO Pushing model to hubβ¦
adapter_model.bin: 0%| | 0.00/33.6M [00:00<?, ?B/s]
adapter_model.bin: 0%| | 0.00/33.6M [00:00<?, ?B/s]
rng_state.pth: 0%| | 0.00/14.2k [00:00<?, ?B/s]
Upload 10 LFS files: 0%| | 0/10 [00:00<?, ?it/s]
optimizer.pt: 0%| | 0.00/67.2M [00:00<?, ?B/s]
scheduler.pt: 0%| | 0.00/1.06k [00:00<?, ?B/s]
adapter_model.bin: 0%| | 8.19k/33.6M [00:00<08:43, 64.2kB/s]
rng_state.pth: 58%|ββββββ | 8.19k/14.2k [00:00<00:00, 63.7kB/s]
optimizer.pt: 0%| | 8.19k/67.2M [00:00<17:24, 64.3kB/s]
adapter_model.bin: 0%| | 8.19k/33.6M [00:00<09:04, 61.7kB/s]
scheduler.pt: 100%|ββββββββββ| 1.06k/1.06k [00:00<00:00, 8.23kB/s]
scheduler.pt: 100%|ββββββββββ| 1.06k/1.06k [00:00<00:00, 5.47kB/s]
adapter_model.bin: 15%|ββ | 4.90M/33.6M [00:00<00:01, 25.8MB/s]
optimizer.pt: 8%|β | 5.17M/67.2M [00:00<00:02, 27.3MB/s]
adapter_model.bin: 15%|ββ | 5.14M/33.6M [00:00<00:01, 26.6MB/s]
rng_state.pth: 100%|ββββββββββ| 14.2k/14.2k [00:00<00:00, 56.3kB/s]
optimizer.pt: 16%|ββ | 10.6M/67.2M [00:00<00:01, 39.0MB/s]
adapter_model.bin: 26%|βββ | 8.71M/33.6M [00:00<00:00, 29.3MB/s]
adapter_model.bin: 26%|βββ | 8.63M/33.6M [00:00<00:00, 29.0MB/s]
tokenizer.model: 0%| | 0.00/500k [00:00<?, ?B/s]
training_args.bin: 0%| | 0.00/4.54k [00:00<?, ?B/s]
tokenizer.model: 100%|ββββββββββ| 500k/500k [00:00<00:00, 7.76MB/s]
training_args.bin: 100%|ββββββββββ| 4.54k/4.54k [00:00<00:00, 111kB/s]
adapter_model.bin: 48%|βββββ | 16.0M/33.6M [00:00<00:00, 35.4MB/s]
adapter_model.bin: 48%|βββββ | 16.0M/33.6M [00:00<00:00, 33.2MB/s]
events.out.tfevents.1697452891.s-ravivishwakarmauzio-autotrain-o3pc-15wq-atys-0-c07f1-797z75jm.113.0: 0%| | 0.00/5.05k [00:00<?, ?B/s]
optimizer.pt: 24%|βββ | 16.0M/67.2M [00:00<00:01, 30.0MB/s]
events.out.tfevents.1697452891.s-ravivishwakarmauzio-autotrain-o3pc-15wq-atys-0-c07f1-797z75jm.113.0: 100%|ββββββββββ| 5.05k/5.05k [00:00<00:00, 223kB/s]
tokenizer.model: 0%| | 0.00/500k [00:00<?, ?B/s]
adapter_model.bin: 81%|ββββββββ | 27.2M/33.6M [00:00<00:00, 57.5MB/s]
adapter_model.bin: 68%|βββββββ | 22.7M/33.6M [00:00<00:00, 42.5MB/s]
optimizer.pt: 34%|ββββ | 22.7M/67.2M [00:00<00:01, 39.8MB/s]
training_args.bin: 0%| | 0.00/4.54k [00:00<?, ?B/s]
tokenizer.model: 100%|ββββββββββ| 500k/500k [00:00<00:00, 2.50MB/s]
training_args.bin: 100%|ββββββββββ| 4.54k/4.54k [00:00<00:00, 149kB/s]
adapter_model.bin: 99%|ββββββββββ| 33.4M/33.6M [00:00<00:00, 49.4MB/s]
adapter_model.bin: 100%|ββββββββββ| 33.6M/33.6M [00:00<00:00, 38.6MB/s]
optimizer.pt: 48%|βββββ | 32.0M/67.2M [00:00<00:00, 37.7MB/s]
adapter_model.bin: 95%|ββββββββββ| 32.0M/33.6M [00:01<00:00, 32.0MB/s]
adapter_model.bin: 100%|ββββββββββ| 33.6M/33.6M [00:01<00:00, 29.3MB/s]
Upload 10 LFS files: 10%|β | 1/10 [00:01<00:12, 1.35s/it]
optimizer.pt: 71%|ββββββββ | 48.0M/67.2M [00:01<00:00, 32.6MB/s]
optimizer.pt: 95%|ββββββββββ| 64.0M/67.2M [00:01<00:00, 43.8MB/s]
optimizer.pt: 100%|ββββββββββ| 67.2M/67.2M [00:01<00:00, 36.6MB/s]
Upload 10 LFS files: 30%|βββ | 3/10 [00:01<00:04, 1.72it/s]
Upload 10 LFS files: 100%|ββββββββββ| 10/10 [00:01<00:00, 5.06it/s]
INFO Pausing spaceβ¦
error: code = NotFound desc = an error occurred when try to find container β79ee03e91c012511e778c348a0fedd2d164fc1d394378d8d52fe2956d80219c0β: not found