Does anyone have working code for training T5-11B on multi-gpu?
|
|
4
|
1054
|
March 30, 2023
|
Avoid recalculating hidden states between generate calls?
|
|
3
|
1204
|
March 30, 2023
|
Is it possible to constraint text generation?
|
|
0
|
408
|
March 30, 2023
|
Vit-Finetuning Inference does not give logits
|
|
0
|
236
|
March 29, 2023
|
DistilBERT for Donut Decoder
|
|
0
|
211
|
March 29, 2023
|
Trainer's step loss always drops sharply after each epoch regardless of model / data
|
|
3
|
2265
|
March 28, 2023
|
T5 multilabel classification using tf
|
|
0
|
508
|
March 28, 2023
|
Best way to select which new model to implement
|
|
0
|
411
|
March 27, 2023
|
Reusing cached context to generate multiple sequences?
|
|
1
|
228
|
March 26, 2023
|
WordLevel Tokenization with GPT2?
|
|
1
|
739
|
March 26, 2023
|
Prohibition on loading models (Probable)
|
|
0
|
486
|
March 25, 2023
|
What is "scheduled LR warm-up"?
|
|
0
|
329
|
March 25, 2023
|
Implement details about Beam Search
|
|
0
|
581
|
March 25, 2023
|
T5-Small Greedy Search out of memory
|
|
0
|
267
|
March 25, 2023
|
New Spaces Primitive Feature Request - Gradio plus Torch plus Transformers Docker spin up?
|
|
1
|
361
|
March 25, 2023
|
How to save and retrieve trained ai locally from python backend
|
|
0
|
362
|
March 24, 2023
|
The input length for bert
|
|
0
|
190
|
March 24, 2023
|
Freeze encoder for some time and then unfreeze - does it improve the model?
|
|
1
|
837
|
March 24, 2023
|
Good word list in generate function
|
|
1
|
626
|
March 23, 2023
|
Troubleshooting
|
|
0
|
229
|
March 23, 2023
|
Transformers on GCP Training stuck on start
|
|
3
|
1216
|
March 22, 2023
|
[Help appreciated] Modifying load_tf_weights_in_albert for transforming ALBERT tensorflow checkpoint to pytorch model
|
|
0
|
369
|
March 22, 2023
|
[Feature Request] Listing available models, datasets and metrics
|
|
0
|
853
|
March 22, 2023
|
Why the checkpoint of old version of BERT can not be used for BERT with new version?
|
|
0
|
312
|
March 22, 2023
|
Huggingface Saving `VisionEncoderDecoderModel` to `TorchScript` problem
|
|
0
|
653
|
March 22, 2023
|
Batch tensor creation error when finetuning gpt2
|
|
2
|
446
|
March 21, 2023
|
Training wav2vac2 requires a lot of compute power
|
|
0
|
194
|
March 21, 2023
|
ValueError: --optim adamw_torch_fused with --fp16 requires PyTorch>2.0
|
|
6
|
1838
|
March 21, 2023
|
CPU usage during batched training
|
|
0
|
312
|
March 21, 2023
|
Deploy whisper by passing last transcribed sentences to decoder's past_key values
|
|
0
|
294
|
March 20, 2023
|