Predictions with pipeline fails to truncate test set
|
|
0
|
181
|
January 23, 2024
|
Memory requirements for DPO 7b model
|
|
0
|
347
|
January 23, 2024
|
Training a Translation Model from Scratch: Guidelines for English to X Translation with a Custom Dataset
|
|
2
|
480
|
January 22, 2024
|
Use custom text encoder in CLIP
|
|
3
|
1503
|
January 22, 2024
|
SFTTrainer checkpointing
|
|
6
|
6467
|
January 21, 2024
|
Domain adaptation fine tune VS instruction_tuned
|
|
2
|
3231
|
January 21, 2024
|
How does one reinitialize the weights of a Hugging Face LLaMA v2 model the official way as the original model?
|
|
4
|
4536
|
January 20, 2024
|
Optimizing LLM Inference with One Base LLM and Multiple LoRA Adapters for Memory Efficiency
|
|
1
|
4752
|
January 20, 2024
|
How to use the already-trained transformer model to make predictions using new data?
|
|
0
|
289
|
January 19, 2024
|
When Fine-Tune the google/vit-base-patch16-384, the train loss is 0 and the eval loss is NaN
|
|
9
|
763
|
January 19, 2024
|
Output probabilities very skewed
|
|
4
|
277
|
January 19, 2024
|
In transformer, if the text exceeds max_seq_length, how to deal with it
|
|
0
|
381
|
January 19, 2024
|
How can I restrict the GPU usage in this case?
|
|
0
|
205
|
January 19, 2024
|
What makes the built-in generate method faster than a crude manual implementation?
|
|
3
|
2000
|
January 19, 2024
|
Trainer() and required_grad=false
|
|
1
|
282
|
January 18, 2024
|
Setting requires_grad=False seems not saving GPU memory usage
|
|
0
|
327
|
January 18, 2024
|
How to train causal language model
|
|
0
|
334
|
January 18, 2024
|
Domain adaptation with MLM and NSP
|
|
3
|
1735
|
January 18, 2024
|
Whisper fine tuning
|
|
0
|
432
|
January 18, 2024
|
Early stopping + trainer + hub
|
|
3
|
4089
|
January 17, 2024
|
Trainer, device error cuda:0 and cuda:1
|
|
3
|
3555
|
January 17, 2024
|
T5ForConditionalGeneration checkpoint size mismatch #19418
|
|
1
|
2579
|
January 17, 2024
|
Multi-label text classification error
|
|
0
|
301
|
January 17, 2024
|
Finetuning Whisper with prompts
|
|
3
|
4164
|
January 16, 2024
|
Llama 2 7B fine-tuned with IA3 errors when performing inference
|
|
2
|
647
|
January 16, 2024
|
Any example of using gradio with apply_chat_template?
|
|
0
|
230
|
January 16, 2024
|
YOLOS Coco Labels mismatch
|
|
1
|
370
|
January 16, 2024
|
Can I use fp16 model for mixed precision training?
|
|
0
|
296
|
January 16, 2024
|
Why is only the parameter `attention_mask` singular?
|
|
0
|
103
|
January 16, 2024
|
Batch generation with GPT2
|
|
12
|
17207
|
January 16, 2024
|