Changing the default branch from master to main
|
|
0
|
920
|
March 21, 2022
|
Value error : Connection error
|
|
8
|
16219
|
August 4, 2021
|
Getting total train_runtime even if training stopped in the middle
|
|
0
|
878
|
March 20, 2022
|
Freezing layers when using gradient checkpointing
|
|
0
|
713
|
March 20, 2022
|
Domain-specific pre-training of GPT? Help!
|
|
1
|
650
|
March 18, 2022
|
NER at the Inference Time
|
|
0
|
441
|
March 18, 2022
|
About the Cross-attention Layer Shape in Encoder-Decoder Model
|
|
1
|
1914
|
March 18, 2022
|
Transformer loss
|
|
0
|
286
|
March 17, 2022
|
Error Loading google/bart-large or bart-xsum
|
|
1
|
359
|
March 17, 2022
|
Batch_decode does not give the correct output as generate
|
|
0
|
301
|
March 17, 2022
|
Sentiment Analysis Pipeline on single label function_to_apply not working
|
|
1
|
1032
|
March 17, 2022
|
Using trainer to train a bart model on 4 gpus failed
|
|
0
|
338
|
March 16, 2022
|
Pre-training a language model on a large dataset
|
|
5
|
3887
|
March 15, 2022
|
Fnet with upper case
|
|
0
|
277
|
March 15, 2022
|
Continue LM pretraining with run_mlm - loss function clarification
|
|
0
|
460
|
March 14, 2022
|
Training arguments for flax
|
|
0
|
253
|
March 14, 2022
|
Need help understanding input of model in generation
|
|
0
|
251
|
March 14, 2022
|
How to extend the vocab of T5?
|
|
0
|
432
|
March 14, 2022
|
Use tf.data.Data with HuggingFace datasets
|
|
2
|
2641
|
March 22, 2021
|
Why we add math to word embedding
|
|
0
|
262
|
March 13, 2022
|
Convert tokens and token-labels to string
|
|
7
|
7631
|
March 12, 2022
|
BigBirdPegasus with attention_type="original_full" vs T5
|
|
0
|
254
|
March 11, 2022
|
NLP Pretrained model model doesnât use GPU when making inference
|
|
11
|
10152
|
March 11, 2022
|
Adding linear layer to transformer model (+ save_pretrained and load_pretrained)
|
|
1
|
3784
|
March 10, 2022
|
Differences between transformers GPT2 and megatron-lm?
|
|
0
|
382
|
March 10, 2022
|
When can we expect TPU Trainer?
|
|
4
|
4067
|
March 3, 2022
|
Is there a way to get per word loss instead of the average loss for GPT model
|
|
0
|
334
|
March 7, 2022
|
Ensemble decoding
|
|
0
|
566
|
March 7, 2022
|
Torch JIT Training
|
|
0
|
1166
|
March 7, 2022
|
BartForConditionalGeneration : lm_head layer dimension change
|
|
0
|
445
|
March 7, 2022
|