Do I need to apply the softmax function to my logit before calculating the CrossEntropyLoss?
|
|
1
|
3214
|
October 15, 2020
|
Finetuned model generating test label exactly
|
|
0
|
462
|
October 15, 2020
|
Understanding <bos> Token in GPT2 Training
|
|
0
|
422
|
October 16, 2020
|
Clarification for the forward function of the SequenceSummary class from modeling_utils.py
|
|
0
|
368
|
October 16, 2020
|
For the logits from HuggingFace Transformer models, can the sum of the elements of the logit vector be greater than 1?
|
|
1
|
1609
|
October 16, 2020
|
Training GPT2 on CPUs?
|
|
4
|
1668
|
October 17, 2020
|
Why do different tokenizers use different vocab files?
|
|
0
|
1774
|
October 18, 2020
|
More complex training setups
|
|
4
|
1014
|
October 18, 2020
|
Hyperparameter for distil bert
|
|
0
|
667
|
October 19, 2020
|
Resume Training / Finetune a language model and further finetune a classifier
|
|
1
|
1259
|
October 19, 2020
|
What is the proper way to do inference using fine-tuned model?
|
|
1
|
329
|
October 19, 2020
|
Adding a new model to Transformers with additional dependencies
|
|
15
|
1456
|
October 19, 2020
|
How to fine-tune the output head of the pre-trained Transformer models?
|
|
0
|
487
|
October 19, 2020
|
Are the weights of the maskedLM head of the `BertForMaskedLM` model pre-trained?
|
|
0
|
417
|
October 19, 2020
|
`add_prefix_space=True` option for the BPE tokenizer
|
|
0
|
1668
|
October 19, 2020
|
How to extract the "student" model after distillation?
|
|
2
|
876
|
October 19, 2020
|
Distillation code works on TPU?
|
|
0
|
310
|
October 19, 2020
|
Load torchtext.data.dataset.Dataset to Trainer
|
|
0
|
558
|
October 20, 2020
|
Pretrain encoder of tf T5 model
|
|
0
|
529
|
October 19, 2020
|
BART for Portuguese
|
|
7
|
1689
|
October 20, 2020
|
Converting Transformers model to Tensorflow
|
|
2
|
778
|
October 20, 2020
|
Load fine tuned model from local
|
|
4
|
10274
|
October 20, 2020
|
Fine-tuning distiBART
|
|
2
|
753
|
October 20, 2020
|
[pegasus] evaluation datasets and build scripts are now available
|
|
0
|
2031
|
October 21, 2020
|
Docker container, run model only
|
|
0
|
1132
|
October 21, 2020
|
Optimizing models using ONNX
|
|
1
|
1114
|
October 21, 2020
|
Model giving same output for eval function but trains
|
|
1
|
1420
|
October 21, 2020
|
Loading pretrained SentencePiece tokenizer from Fairseq
|
|
5
|
6357
|
October 21, 2020
|
Load/save HF block sparse model
|
|
1
|
397
|
October 21, 2020
|
TransformerXL on Custom Language
|
|
1
|
250
|
October 21, 2020
|