T5 for Named Entity Recognition
|
|
2
|
6296
|
November 24, 2020
|
Accuracy changes dramatically
|
|
0
|
561
|
November 23, 2020
|
Token classification probability and scoring
|
|
0
|
743
|
November 23, 2020
|
How to train TFT5ForConditionalGeneration model?
|
|
5
|
3312
|
November 21, 2020
|
How to create the warmup and decay from the BERT/Roberta papers?
|
|
2
|
7310
|
November 18, 2020
|
Initializing the weights of the final layer of e.g. BertForTokenClassification with a manual seed
|
|
2
|
7853
|
October 6, 2020
|
Convert mT5 to HF weights?
|
|
6
|
991
|
November 17, 2020
|
mBART finetuning tips/post-mortem
|
|
6
|
2616
|
November 17, 2020
|
Abbreviation expansions
|
|
0
|
730
|
November 17, 2020
|
Evaluation metrics
|
|
1
|
1996
|
November 16, 2020
|
Learning rate setting
|
|
1
|
1932
|
November 16, 2020
|
New Model sharing and uploading is extremely slow
|
|
2
|
3528
|
November 16, 2020
|
GPT2 with TensorFlow?
|
|
1
|
370
|
November 14, 2020
|
Distributed Training on Databricks
|
|
0
|
894
|
November 14, 2020
|
Custom DistilBertTokenizer training
|
|
3
|
652
|
November 13, 2020
|
DPR retriever module
|
|
1
|
831
|
November 6, 2020
|
Transformers v4.0.0 announcement
|
|
2
|
2242
|
November 12, 2020
|
Clarification: finetune.py max target length
|
|
2
|
442
|
November 12, 2020
|
Gradient accumulation averages over gradient
|
|
2
|
1996
|
November 12, 2020
|
BERT2BERT Notebook for Models without GenerationMixin
|
|
0
|
285
|
November 12, 2020
|
Num_beams: Faster Summarization without Distillation
|
|
1
|
576
|
November 12, 2020
|
Why does PretrainedConfig.use_cache default to True?
|
|
0
|
494
|
November 11, 2020
|
BartForConditionalGeneration "logits" shape is wrong/unexpected
|
|
4
|
911
|
November 11, 2020
|
How to evaluate T5 on classification task in case of multiple tasks
|
|
0
|
591
|
November 11, 2020
|
Seq2Seq Distillation: train_distilbart_xsum error
|
|
5
|
433
|
November 10, 2020
|
Torchscript vector Input
|
|
0
|
278
|
November 8, 2020
|
Pre-trained Sandwich transformer model
|
|
0
|
300
|
November 5, 2020
|
Best pre-trained transformer question answer model
|
|
0
|
240
|
November 5, 2020
|
Is there a pre-trained BERT model with the sequence length 2048?
|
|
2
|
2074
|
November 5, 2020
|
T5-base model create spelling mistake is summary
|
|
2
|
763
|
November 5, 2020
|