About the DeepSpeed category
|
|
1
|
676
|
October 30, 2021
|
Deepspeed zero-2 cpu offloading killing process = -9 error
|
|
1
|
648
|
March 17, 2024
|
Conceptual question: Early loading of the model defeats the purpose of deepspeed!
|
|
0
|
51
|
March 14, 2024
|
Struggle with finetuneing flan-t5-xxl using deepspeed
|
|
3
|
553
|
March 12, 2024
|
Deepspeed inference stage 3 + quantization
|
|
0
|
100
|
March 8, 2024
|
Saving checkpoint is too slow with deepspeed
|
|
5
|
1040
|
March 6, 2024
|
Deepspeed trainer and custom loss weights
|
|
1
|
320
|
February 28, 2024
|
How can I use Inference API with my model?
|
|
0
|
58
|
February 24, 2024
|
Finetune LLM with DeepSpeed
|
|
2
|
3676
|
February 22, 2024
|
Get Real and Authentic Documents online
|
|
1
|
690
|
February 5, 2024
|
DeepSpeed integration for HuggingFace Seq2SeqTrainingArguments
|
|
0
|
256
|
February 22, 2024
|
It says that `bfloat16.enabled` without `auto' needed to be specified when training T5, is anyone aware of how to solve that?
|
|
0
|
83
|
February 20, 2024
|
Exact difference between Transformers' and Accelerate's DeepSpeed integrations?
|
|
5
|
311
|
February 13, 2024
|
How to use GPU when using transformers.AutoModel
|
|
0
|
305
|
February 3, 2024
|
Multi GPU training - Model parallelism
|
|
1
|
1169
|
February 2, 2024
|
More processes than GPUs with DeepSpeed launcher
|
|
0
|
87
|
January 25, 2024
|
LoRA training with accelerate / deepspeed
|
|
0
|
321
|
January 22, 2024
|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
|
|
3
|
1747
|
January 12, 2024
|
Rewrite trainer's save_model method get unexpected pytorch_model.bin file
|
|
0
|
136
|
January 8, 2024
|
Model (Pipeline) Parallelism in SLURM cluster
|
|
0
|
94
|
January 6, 2024
|
Mixtral bad FP16 performance
|
|
0
|
325
|
January 3, 2024
|
Deepspeed script launcher vs accelerate script launcher for TRL
|
|
0
|
200
|
December 25, 2023
|
Best practice to run DeepSpeed
|
|
2
|
1147
|
December 25, 2023
|
Codellama will not stop generating at EOS
|
|
0
|
307
|
December 20, 2023
|
Infrence time increase when using multi-GPU
|
|
1
|
716
|
November 28, 2023
|
Resume_from_checkpoint does not configure learning rate scheduler correctly
|
|
3
|
496
|
November 28, 2023
|
What does LoRA do to model by default?
|
|
0
|
349
|
November 21, 2023
|
Running model.generate() in deep speed training
|
|
0
|
189
|
November 19, 2023
|
The same hyperparameters with deepspeed is worse than without deepseepd
|
|
2
|
335
|
November 13, 2023
|
Deepspeed stage 3 partition
|
|
0
|
281
|
October 31, 2023
|