Trainer class, compute_metrics and EvalPrediction
|
|
6
|
14637
|
October 28, 2020
|
Can't use DistributedDataParallel for training the EncoderDecoderModel
|
|
2
|
5496
|
October 27, 2020
|
Forward-looking or left-context attention mask (left-to-right) generation with BertGeneration and RobertaForCausalLM
|
|
3
|
1359
|
October 27, 2020
|
RAG: Do we need to pretrained the doc-encoder when using a custom dataset?
|
|
0
|
643
|
October 26, 2020
|
How to integrate an AzureMLCallback for logging in Azure?
|
|
4
|
1518
|
October 26, 2020
|
Getting output attentions for encoder_attention decoder layers
|
|
0
|
358
|
October 24, 2020
|
RuntimeError: arguments are located on different GPUs
|
|
2
|
1869
|
October 24, 2020
|
Running a Trainer in DistributedDataParallel mode
|
|
1
|
1452
|
October 24, 2020
|
Convert new T5 checkpoints released from Google (NaturalQuestion dataset)
|
|
3
|
1492
|
October 18, 2020
|
Passing the tokenizer to Trainer for bucketing does not work for evaluation set
|
|
5
|
1632
|
October 23, 2020
|
RAG Class for Question Answering
|
|
0
|
422
|
October 22, 2020
|
How to use the Rostlab/prot_bert fill-mask pipeline
|
|
1
|
570
|
October 22, 2020
|
Docker container, run model only
|
|
0
|
1149
|
October 21, 2020
|
Converting Transformers model to Tensorflow
|
|
2
|
790
|
October 20, 2020
|
BART for Portuguese
|
|
7
|
1703
|
October 20, 2020
|
`add_prefix_space=True` option for the BPE tokenizer
|
|
0
|
1753
|
October 19, 2020
|
Are the weights of the maskedLM head of the `BertForMaskedLM` model pre-trained?
|
|
0
|
419
|
October 19, 2020
|
How to fine-tune the output head of the pre-trained Transformer models?
|
|
0
|
493
|
October 19, 2020
|
Adding a new model to Transformers with additional dependencies
|
|
15
|
1464
|
October 19, 2020
|
More complex training setups
|
|
4
|
1023
|
October 18, 2020
|
Why do different tokenizers use different vocab files?
|
|
0
|
1804
|
October 18, 2020
|
Training GPT2 on CPUs?
|
|
4
|
1683
|
October 17, 2020
|
For the logits from HuggingFace Transformer models, can the sum of the elements of the logit vector be greater than 1?
|
|
1
|
1624
|
October 16, 2020
|
Clarification for the forward function of the SequenceSummary class from modeling_utils.py
|
|
0
|
371
|
October 16, 2020
|
Do I need to apply the softmax function to my logit before calculating the CrossEntropyLoss?
|
|
1
|
3251
|
October 15, 2020
|
Keeping some tokens untranslated
|
|
0
|
568
|
October 15, 2020
|
Getting predictions
|
|
1
|
288
|
October 15, 2020
|
Distillation: create student model from a different base model than teacher
|
|
9
|
2112
|
October 14, 2020
|
Is there any way to control the input of a `Longformer` layer?
|
|
1
|
253
|
October 14, 2020
|
I'm getting "nan" value for loss, while following a tutorial from the documentatin
|
|
0
|
682
|
October 14, 2020
|