When I'm downloading the weights, the cell keeps running and doesn't stop. I need to fine tune Mistral-Small-3.1-24B-Instruct-2503 model
|
|
3
|
9
|
May 2, 2025
|
Why `inv_freq` when computing frequencies for RoPE
|
|
1
|
5
|
May 1, 2025
|
Using GRPOTrainer with a custom PyTorch module?
|
|
3
|
15
|
April 29, 2025
|
Trainer + Datasets + Pytorch Dataloader Workers - how to manage memory usage?
|
|
1
|
13
|
April 29, 2025
|
"No log" for training loss
|
|
0
|
5
|
April 29, 2025
|
Attention mask shape (custom attention masking)
|
|
3
|
503
|
April 27, 2025
|
Fine Tuning Llava 1.5 7b for Classification
|
|
1
|
15
|
April 27, 2025
|
How to use customized compute_metrics in trainer
|
|
1
|
8
|
April 26, 2025
|
500 Internal Error - We're working hard to fix this as soon as possible
|
|
44
|
1569
|
April 25, 2025
|
How to force the assistant to write some tokens mid-generation?
|
|
0
|
7
|
April 23, 2025
|
Ethical AI x Narrative Intervention
|
|
0
|
11
|
April 24, 2025
|
How to start fsdp2 when using trainer?
|
|
0
|
28
|
April 23, 2025
|
Saving pretrained to same directory as load
|
|
2
|
22
|
April 23, 2025
|
Can't perform image inference with Gemma 3 12b it qat4.0
|
|
1
|
36
|
April 23, 2025
|
Sample weighting in DPOTrainer
|
|
0
|
8
|
April 23, 2025
|
How to avoid PreTrainedTokenizerFast.decode to add space between tokens
|
|
3
|
13
|
April 22, 2025
|
How can I make use of GPU manually to run inference faster?
|
|
3
|
22
|
April 22, 2025
|
Error using deepspeed for sftconfig
|
|
1
|
12
|
April 21, 2025
|
AI Microsoft hackthon 4=1
|
|
0
|
8
|
April 21, 2025
|
Deepspeed zero3 does not work with Diffusion Models. Does anyone know how to fix this?
|
|
1
|
2059
|
April 18, 2025
|
Code from HF tutorial on the customization of transformer components is not working as intended
|
|
4
|
26
|
April 18, 2025
|
The effect of padding_side
|
|
12
|
13443
|
April 17, 2025
|
The current text generation call will exceed the model's predefined maximum length
|
|
1
|
2321
|
April 16, 2025
|
Why are only 2 of the RT-DETR v2 implemented losses actually used?
|
|
0
|
22
|
April 16, 2025
|
SSL Certificate Issue
|
|
11
|
24347
|
April 16, 2025
|
Push_to_hub() stucked
|
|
5
|
39
|
April 15, 2025
|
Distributed Training w/ Trainer
|
|
9
|
8672
|
April 14, 2025
|
ValueError: Image features and image tokens do not match
|
|
2
|
335
|
April 14, 2025
|
[Owlv2 - image_guided_detection - embed_image_query] Why choosing the least similar box from selected ones?
|
|
5
|
590
|
April 13, 2025
|
How to properly load the PEFT LoRA model
|
|
4
|
6586
|
April 13, 2025
|