PPO Training does not improve SFT model outputs (Metrics identical before and after PPO)
|
|
0
|
9
|
May 18, 2025
|
Grouping by length makes training loss oscillate and makes evaluation loss worse
|
|
1
|
212
|
May 16, 2025
|
Cuda out of memory in SD3
|
|
4
|
15
|
May 16, 2025
|
Stopiteration error
|
|
1
|
19
|
May 16, 2025
|
AttributeError: 'CustomQwen3Model' object has no attribute 'config'
|
|
1
|
7
|
May 16, 2025
|
How to freeze layers while fine-tuning?
|
|
2
|
23
|
May 16, 2025
|
Trainer default distributed training behaviour
|
|
2
|
9
|
May 15, 2025
|
What does increasing number of heads do in the Multi-head Attention?
|
|
5
|
29500
|
May 15, 2025
|
Does high number of output labels affect the performance of BERT and how to handle the class imbalance issue while doing multi text classification?
|
|
2
|
413
|
May 14, 2025
|
Mamba2 Cache Position
|
|
4
|
108
|
May 12, 2025
|
Building something that help people who really need help using ai
|
|
4
|
36
|
May 12, 2025
|
I was excited about the D-FINE model, but I have got ABYSMAL Results
|
|
0
|
37
|
May 11, 2025
|
(first token generation puzzle)Why does transformers take the last dimension as output when generating the first token in language generation process?
|
|
9
|
1947
|
May 11, 2025
|
Transformers: Informer model use for weather forecasting
|
|
1
|
10
|
May 9, 2025
|
Resolving "Cannot Perform Fine-Tuning on Purely Quantized Models" Error in Falcon Model Training?
|
|
4
|
8608
|
May 9, 2025
|
How to resume training from a checkpoint using huggingface trainer
|
|
5
|
82
|
May 8, 2025
|
You stoped providing https://huggingface.co/KBLab/sentence-bert-swedish-cased
|
|
1
|
13
|
May 8, 2025
|
AutoTokenizer.from_pretrained() suddenly raises an error
|
|
4
|
31
|
May 7, 2025
|
Wav2vec2 Acces Feature Layers Performance
|
|
1
|
448
|
May 7, 2025
|
Prepare dataset from YOLO format to COCO for DETR
|
|
4
|
5000
|
May 6, 2025
|
ćkv cache mergeć I want to know if the result of calculating their respective k v cache and concatenating them together is correct
|
|
5
|
40
|
May 6, 2025
|
Downloading a model from the hub without loading it
|
|
6
|
3675
|
May 5, 2025
|
Why are only 2 of the RT-DETR v2 implemented losses actually used?
|
|
3
|
35
|
May 5, 2025
|
500 Internal Error - We're working hard to fix this as soon as possible
|
|
44
|
1718
|
April 25, 2025
|
When I'm downloading the weights, the cell keeps running and doesn't stop. I need to fine tune Mistral-Small-3.1-24B-Instruct-2503 model
|
|
4
|
36
|
May 2, 2025
|
Why `inv_freq` when computing frequencies for RoPE
|
|
2
|
13
|
May 1, 2025
|
Using GRPOTrainer with a custom PyTorch module?
|
|
3
|
20
|
April 29, 2025
|
Trainer + Datasets + Pytorch Dataloader Workers - how to manage memory usage?
|
|
1
|
20
|
April 29, 2025
|
"No log" for training loss
|
|
0
|
7
|
April 29, 2025
|
Attention mask shape (custom attention masking)
|
|
3
|
538
|
April 27, 2025
|