FSDP FULL_SHARD: 3GPUs works, 2GPUs hangs at 1st step
|
|
0
|
64
|
August 26, 2024
|
Can I convert llama 2 "Chat" model into onnx using llama/convert_to_onnx.py script?
|
|
5
|
1753
|
August 26, 2024
|
Forward Pass Output Logits
|
|
0
|
67
|
August 26, 2024
|
I need help getting more accurate results after training
|
|
0
|
53
|
August 25, 2024
|
Run Any Model Without GPU for AMD EPYC 7282?
|
|
0
|
57
|
August 25, 2024
|
GPU over head using by5g
|
|
0
|
14
|
August 25, 2024
|
Back propogation throught a KandinskyV22Pipeline image generator
|
|
0
|
11
|
August 25, 2024
|
Can someone help guide how to finetune DeBERTa V3 model?
|
|
1
|
1117
|
August 25, 2024
|
Accelerate + Gemma2 + FSDP
|
|
1
|
146
|
August 25, 2024
|
What does "trim_offsets" do in tokenizer post-processor?
|
|
0
|
42
|
August 25, 2024
|
Study with AI developers and Researchers
|
|
0
|
21
|
August 25, 2024
|
RuntimeError: The size of tensor a (4096) must match the size of tensor b (4097) at non-singleton dimension 3
|
|
1
|
342
|
August 24, 2024
|
Fine-Tune TrOCR on Arabic
|
|
3
|
1413
|
August 24, 2024
|
Space not building and showing no logs
|
|
3
|
52
|
August 24, 2024
|
Layer specific Fine Tuning whisper
|
|
0
|
10
|
August 24, 2024
|
How to use pytorch to process variance sequence
|
|
0
|
6
|
August 24, 2024
|
Possible to rollback to a model's commit hash?
|
|
0
|
331
|
August 24, 2024
|
Trainer stuck mid epoch
|
|
0
|
27
|
August 24, 2024
|
Emotion dataset not available
|
|
3
|
404
|
August 24, 2024
|
Deploy Button Not Showing - Fine Tuned Llama 3.1
|
|
3
|
224
|
August 24, 2024
|
Best practices to use models requiring flash_attn on Apple silicon macs (or non CUDA)?
|
|
2
|
5989
|
August 23, 2024
|
Recovering IterableDataset state if it crashes mid stream
|
|
0
|
29
|
August 22, 2024
|
How does padding side affect training?
|
|
0
|
219
|
August 23, 2024
|
[SOLVED] Trying to fine-tune Llama, getting NaN gradients after a single step
|
|
1
|
859
|
August 23, 2024
|
Multi-Task Learning
|
|
0
|
34
|
August 23, 2024
|
How to train a combination model
|
|
0
|
16
|
August 23, 2024
|
Error deploying endpoint on Aws
|
|
6
|
166
|
August 23, 2024
|
When AI architecture will going native 2048x2048?
|
|
0
|
21
|
August 23, 2024
|
TGI and turn off Flash Attention v2
|
|
4
|
1630
|
August 23, 2024
|
Access issues for gated repos
|
|
3
|
3056
|
August 23, 2024
|