Hi, I’ve got a definite beginner situation/question…
I trained a model overnight with Trainer
API. It seemed to finish since I got the Training completed. Do not forget to share your model on huggingface.co/models =)
prompt.
I then wanted to do some predictions within the script. However, that didn’t appear to happen. The script stopped since I got the leaked semphare objects warning at the very end of my log.
I forgot to include an explicit save trainer.save_model
call in the script.
My questions:
-
Is there a way to retrieve the trained model and use it for predictions? I can see checkpoints in my
test-trainer
folder and a reference to the model in the cache. -
Why did I get the
leaked semaphore objects
warning if the training finished?
Any help would be appreciated.
If it helps for more detail, I copied the tail of my log, starting from the Training completed
prompt (removed unhelpful lines).
100%|██████████| 911/911 [32:52<00:00, 1.92s/it]e[A
e[A
Training completed. Do not forget to share your model on huggingface.co/models =)
100%|██████████| 3984/3984 [15:16:58<00:00, 12.06s/it]
100%|██████████| 3984/3984 [15:16:58<00:00, 13.81s/it]
The following columns in the test set don't have a corresponding argument in `XLNetForSequenceClassification.forward` and have been ignored: hypothesis, idx, premise. If hypothesis, idx, premise are not expected by `XLNetForSequenceClassification.forward`, you can safely ignore this message.
***** Running Prediction *****
Num examples = 7285
Batch size = 8
{'eval_loss': 0.06020349636673927, 'eval_runtime': 1975.7072, 'eval_samples_per_second': 3.687, 'eval_steps_per_second': 0.461, 'epoch': 3.0}
{'train_runtime': 55018.864, 'train_samples_per_second': 0.579, 'train_steps_per_second': 0.072, 'train_loss': 0.05098113381719015, 'epoch': 3.0}
0%| | 0/911 [00:00<?, ?it/s]
0%| | 2/911 [00:03<22:50, 1.51s/it]
0%| | 3/911 [00:05<32:03, 2.12s/it]
0%| | 4/911 [00:08<35:46, 2.37s/it]
1%| | 5/911 [00:11<39:23, 2.61s/it]
1%| | 6/911 [00:14<41:11, 2.73s/it]
1%| | 7/911 [00:16<33:51, 2.25s/it]
1%| | 8/911 [00:17<28:34, 1.90s/it]
1%| | 9/911 [00:18<24:58, 1.66s/it]
1%| | 10/911 [00:19<22:09, 1.48s/it]
...
99%|█████████▉| 906/911 [32:10<00:10, 2.02s/it]
100%|█████████▉| 907/911 [32:12<00:08, 2.05s/it]
100%|█████████▉| 908/911 [32:14<00:06, 2.12s/it]
100%|█████████▉| 909/911 [32:16<00:04, 2.19s/it]
100%|█████████▉| 910/911 [32:19<00:02, 2.21s/it]
100%|██████████| 911/911 [32:20<00:00, 1.99s/it]loading configuration file https://huggingface.co/ynie/xlnet-large-cased-snli_mnli_fever_anli_R1_R2_R3-nli/resolve/main/config.json from cache at /Users/<USER>/.cache/huggingface/transformers/6c94c94c14efab475d1f94dd6e8db89c88d795ee247d6d8bc2abcf08e0a0ffd0.daa7acdd41354d5f480660f3a1afeaf69ccac9a5c013e173e2f5b557a777eaa4
Model config XLNetConfig {
"_name_or_path": "ynie/xlnet-large-cased-snli_mnli_fever_anli_R1_R2_R3-nli",
"architectures": [
"XLNetForSequenceClassification"
],
"attn_type": "bi",
"bi_data": false,
"bos_token_id": 1,
"clamp_len": -1,
"d_head": 64,
"d_inner": 4096,
"d_model": 1024,
"dropout": 0.1,
"end_n_top": 5,
"eos_token_id": 2,
"ff_activation": "gelu",
"id2label": {
"0": "entailment",
"1": "neutral",
"2": "contradiction"
},
"initializer_range": 0.02,
"label2id": {
"contradiction": 2,
"entailment": 0,
"neutral": 1
},
"layer_norm_eps": 1e-12,
"mem_len": null,
"model_type": "xlnet",
"n_head": 16,
"n_layer": 24,
"pad_token_id": 5,
"reuse_len": null,
"same_length": false,
"start_n_top": 5,
"summary_activation": "tanh",
"summary_last_dropout": 0.1,
"summary_type": "last",
"summary_use_proj": true,
"task_specific_params": {
"text-generation": {
"do_sample": true,
"max_length": 250
}
},
"transformers_version": "4.17.0",
"untie_r": true,
"use_mems_eval": true,
"use_mems_train": false,
"vocab_size": 32000
}
loading weights file https://huggingface.co/ynie/xlnet-large-cased-snli_mnli_fever_anli_R1_R2_R3-nli/resolve/main/pytorch_model.bin from cache at /Users/<USER>/.cache/huggingface/transformers/497700d1fcba7e3645179d764fdfb1876debe562dc47f01626383396694a7a44.8fd318050b4dc29b0a5933a2c5a73385fe6607522f2ca1622e82339745b920b8
All model checkpoint weights were used when initializing XLNetForSequenceClassification.
All the weights of XLNetForSequenceClassification were initialized from the model checkpoint at ynie/xlnet-large-cased-snli_mnli_fever_anli_R1_R2_R3-nli.
If your task is similar to the task the model of the checkpoint was trained on, you can already use XLNetForSequenceClassification for predictions without further training.
/Users/<USER>/opt/anaconda3/envs/nlu/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '