T5-small trained with small dataset not infering anything
|
|
0
|
212
|
April 25, 2023
|
T5 for classification task
|
|
0
|
489
|
April 25, 2023
|
RTX 6000 Ada slower then 3090
|
|
0
|
614
|
April 25, 2023
|
Question about using trainer with DeepSpeed
|
|
0
|
465
|
April 25, 2023
|
Issues in finetuning t5-large model
|
|
1
|
462
|
April 25, 2023
|
How to use FSDP + DPP in Trainer
|
|
1
|
1019
|
April 24, 2023
|
Is detokenize available in transformer lib?
|
|
2
|
2806
|
April 24, 2023
|
Tied weights for encoder and decoder vocab matrix hard coded in T5?
|
|
0
|
901
|
April 24, 2023
|
About fill-mask pipeline with [mask] made up of multiple tokens
|
|
0
|
325
|
April 24, 2023
|
Generation using contrastive search
|
|
0
|
179
|
April 24, 2023
|
Trade off between max_length vs loss
|
|
0
|
198
|
April 23, 2023
|
Mt5 fine-tuning using fp16 yields zero loss
|
|
1
|
643
|
April 23, 2023
|
Issues loading NLLB 54B MoE model for multi-GPU inferencing using accelerate
|
|
0
|
902
|
April 22, 2023
|
Support for ASR inference on longer audiofiles or on live transcription?
|
|
2
|
481
|
April 21, 2023
|
Whisper on long audio files -- support for chunking?
|
|
3
|
5810
|
April 21, 2023
|
What happens when loading shards?
|
|
0
|
2521
|
April 21, 2023
|
How can I introspect the input and output keys for an arbitrary model?
|
|
1
|
439
|
April 21, 2023
|
Question about Bloom pretrain
|
|
0
|
166
|
April 21, 2023
|
Is it true that Deepspeed currently does not support regression tasks and only supports softmax-based classification tasks?
|
|
0
|
275
|
April 21, 2023
|
How to prevent redownloading in from_pretrained caused by hash?
|
|
0
|
583
|
April 21, 2023
|
How does Segformer handle image size differences?
|
|
5
|
4128
|
April 20, 2023
|
Does anyone else observer RoBERTa fine-tuning instability?
|
|
8
|
3140
|
April 20, 2023
|
Image classification tutorial bug
|
|
0
|
216
|
April 20, 2023
|
LayoutLMv3 Onnx Conversion
|
|
1
|
815
|
April 20, 2023
|
Trying to understand the task-specific head for diff. models + Transformers AutoModel
|
|
0
|
436
|
April 20, 2023
|
Fusion-in-Decoder models
|
|
3
|
2984
|
April 20, 2023
|
How to get the score of the response of the model?
|
|
0
|
201
|
April 19, 2023
|
Same model GPT-NEO-XT behave differently with same prompts & different context
|
|
0
|
277
|
April 19, 2023
|
Force word embeddings for a specific language with facebook/m2m100_418M
|
|
0
|
213
|
April 19, 2023
|
Type hinting Inconsistency in beam_search.py
|
|
0
|
189
|
April 19, 2023
|