Accelerate TPU training
|
|
0
|
120
|
July 5, 2024
|
GPU memory calculator
|
|
2
|
1724
|
July 5, 2024
|
How to do data parallelism for num_return_sequences in generation pipeline
|
|
0
|
91
|
July 2, 2024
|
Accelerator.device always show xla:0 not opus
|
|
0
|
112
|
July 2, 2024
|
Accelerator.__init__() got an unexpected keyword argument 'use_seedable_sampler'
|
|
2
|
2447
|
June 26, 2024
|
Why is the training time differ?
|
|
1
|
311
|
June 25, 2024
|
How loss/metric reporting works with deepspeed and transformers.Trainer?
|
|
0
|
140
|
June 24, 2024
|
Early stopping for eval loss causes timeout?
|
|
10
|
1673
|
June 20, 2024
|
What does unwrapping a model do and why use this?
|
|
0
|
196
|
June 18, 2024
|
Accelerate config in Seq2SeqTrainer
|
|
0
|
141
|
June 17, 2024
|
LLama3-8B - FSDP + QLORA results in OOM with 4 A40's
|
|
1
|
799
|
June 17, 2024
|
Multi-GPU Issue when trying Diffusers demo
|
|
0
|
533
|
June 16, 2024
|
How to pass `ProjectConfig` to `accelerate launch` command?
|
|
0
|
110
|
June 14, 2024
|
Resume training with lesser GPUs Error rng_state_6.pth
|
|
0
|
169
|
June 13, 2024
|
Lora finetuning 35 B model error
|
|
0
|
142
|
June 11, 2024
|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! I am on a single T4 GPU
|
|
6
|
1122
|
June 10, 2024
|
Extremely slow loading with accelerate 0.31.0?
|
|
2
|
308
|
June 10, 2024
|
Feature Request: Add DDP Communication Hooks
|
|
2
|
289
|
June 9, 2024
|
Low bf16 performance on TPU, int4 vs int8 quantizatoin
|
|
0
|
333
|
June 1, 2024
|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
|
|
1
|
731
|
May 31, 2024
|
Weights & Biases sweep with multi gpu accelerate launch
|
|
4
|
2629
|
May 28, 2024
|
ORPO Trainer giving error when fine-tuning Llama3-8b in Multi-GPU environment
|
|
8
|
1162
|
May 27, 2024
|
Segmentation fault core dumped (Solved)
|
|
1
|
628
|
May 27, 2024
|
How to do distributed Inference for large models with multiprocess?
|
|
3
|
613
|
May 26, 2024
|
ValueError (unknown key enable_cpu_affinity) on SageMaker for Accelerate >=0.29.0
|
|
3
|
1654
|
May 22, 2024
|
Getting the error: AssertionError: Non-root FSDP instance's `_is_root` should not have been set yet or should have been set to `False` while Finetuning GPT2 model
|
|
0
|
395
|
May 21, 2024
|
Hugging Face Trainer class with accelerate
|
|
2
|
376
|
May 21, 2024
|
Feature Request: Elastic Launch Support in `notebook_launcher`
|
|
0
|
127
|
May 16, 2024
|
Degraded results after loading from checkpoint
|
|
0
|
151
|
May 13, 2024
|
How to launch multi node training using accelerate launch
|
|
0
|
550
|
May 13, 2024
|