Deepspeed inference stage 3 + quantization
|
|
0
|
784
|
March 8, 2024
|
Saving checkpoint is too slow with deepspeed
|
|
5
|
2143
|
March 6, 2024
|
Deepspeed trainer and custom loss weights
|
|
1
|
508
|
February 28, 2024
|
How can I use Inference API with my model?
|
|
0
|
146
|
February 24, 2024
|
Finetune LLM with DeepSpeed
|
|
2
|
4688
|
February 22, 2024
|
DeepSpeed integration for HuggingFace Seq2SeqTrainingArguments
|
|
0
|
1115
|
February 22, 2024
|
It says that `bfloat16.enabled` without `auto' needed to be specified when training T5, is anyone aware of how to solve that?
|
|
0
|
226
|
February 20, 2024
|
Exact difference between Transformers' and Accelerate's DeepSpeed integrations?
|
|
5
|
658
|
February 13, 2024
|
How to use GPU when using transformers.AutoModel
|
|
0
|
1312
|
February 3, 2024
|
Multi GPU training - Model parallelism
|
|
1
|
1693
|
February 2, 2024
|
More processes than GPUs with DeepSpeed launcher
|
|
0
|
213
|
January 25, 2024
|
Rewrite trainer's save_model method get unexpected pytorch_model.bin file
|
|
0
|
333
|
January 8, 2024
|
Model (Pipeline) Parallelism in SLURM cluster
|
|
0
|
225
|
January 6, 2024
|
Mixtral bad FP16 performance
|
|
0
|
500
|
January 3, 2024
|
Deepspeed script launcher vs accelerate script launcher for TRL
|
|
0
|
337
|
December 25, 2023
|
Best practice to run DeepSpeed
|
|
2
|
1440
|
December 25, 2023
|
Infrence time increase when using multi-GPU
|
|
1
|
865
|
November 28, 2023
|
Resume_from_checkpoint does not configure learning rate scheduler correctly
|
|
3
|
794
|
November 28, 2023
|
What does LoRA do to model by default?
|
|
0
|
518
|
November 21, 2023
|
The same hyperparameters with deepspeed is worse than without deepseepd
|
|
2
|
415
|
November 13, 2023
|
Deepspeed stage 3 partition
|
|
0
|
545
|
October 31, 2023
|
Unable to train model (Loss is 0.000000)
|
|
2
|
1028
|
October 17, 2023
|
Pix2struct based model ddp code conversion
|
|
1
|
306
|
October 11, 2023
|
Speed up beam search for item generation
|
|
1
|
882
|
October 4, 2023
|
Hi why its not working?
|
|
6
|
2799
|
August 24, 2023
|
Learning rate with deepspeed is fixed despite lr set to auto
|
|
2
|
1898
|
September 6, 2023
|
Model connection timed out, even on simple requests
|
|
0
|
298
|
August 31, 2023
|
How does from_pretrained work with ZeRO=3?
|
|
0
|
634
|
August 14, 2023
|
ZeRO3 with int8 training
|
|
0
|
764
|
August 11, 2023
|
How to add java_home in HF space(spark + llama)
|
|
0
|
359
|
August 8, 2023
|