NEFTune doesn't seem to be working
|
|
2
|
31
|
February 9, 2025
|
Best way to extend vocabulary of pretrained model?
|
|
3
|
2785
|
February 9, 2025
|
SFTTrainer training very slow on GPU. Is this training speed expected?
|
|
4
|
179
|
February 8, 2025
|
Saving checkpoints *only* on improvement
|
|
2
|
60
|
February 8, 2025
|
Automatically converts text into videos with relevant visuals and narration
|
|
2
|
1
|
February 17, 2025
|
I wonder how to merge my PEFT adapter with the base model and finally get a new whole model?
|
|
27
|
418
|
February 7, 2025
|
Running DPOTrainer with custom gpu management
|
|
0
|
22
|
February 7, 2025
|
Accelerate use of memory
|
|
1
|
19
|
February 7, 2025
|
Trainer API for data parallel on multi-node
|
|
4
|
41
|
February 6, 2025
|
How to know if a word is OOV or not with my model
|
|
1
|
314
|
February 4, 2025
|
Is there a way to terminate llm.generate and release the GPU memory for next prompt?
|
|
1
|
78
|
February 4, 2025
|
Create a weighted loss function to handle imbalance?
|
|
3
|
382
|
February 3, 2025
|
Loading quantized model on CPU only
|
|
6
|
17395
|
February 3, 2025
|
meta-llama/Llama-2-7b-chat-hf weird responses, compared to the ones returned by the HF API
|
|
1
|
64
|
February 2, 2025
|
Is LLaMA rotary embedding implementation correct?
|
|
7
|
7219
|
February 1, 2025
|
Trainer being very slow to init training setting group_by_length to True
|
|
1
|
265
|
February 1, 2025
|
Using model() instead of model.generate()
|
|
3
|
240
|
January 30, 2025
|
Download DeepSeek R1 685B locally for future fine tuneing
|
|
2
|
1798
|
January 31, 2025
|
Java version of transformers library?
|
|
1
|
38
|
January 30, 2025
|
Llama-2 find answer in a transcript
|
|
1
|
18
|
January 30, 2025
|
Setting up my custom device map for a LLM
|
|
3
|
4607
|
January 29, 2025
|
Can Donut model be used to query Multipage documents?
|
|
3
|
1443
|
January 29, 2025
|
Cannot use Hugging Face cache on a read-only filesystem
|
|
3
|
116
|
January 29, 2025
|
PPOTrainer + LoRA and Continued Training
|
|
0
|
57
|
January 28, 2025
|
ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format
|
|
2
|
61
|
January 28, 2025
|
Convert RT-DETR model to coreml
|
|
3
|
63
|
January 27, 2025
|
Logits from generate and model call different
|
|
2
|
818
|
January 26, 2025
|
Problem generating with T5ForConditionalGeneration on a custom task
|
|
2
|
26
|
January 26, 2025
|
GPTQ quantization on Custom dataset
|
|
4
|
552
|
January 24, 2025
|
The Best Approach for Weighted Multilabel Classification
|
|
1
|
37
|
January 24, 2025
|