|
Saving weights while finetuning is on
|
|
0
|
108
|
June 13, 2024
|
|
Deepspeed ZeRO2, PEFT, bitsnbytes training
|
|
0
|
139
|
June 4, 2024
|
|
Codellama will not stop generating at EOS
|
|
1
|
603
|
June 2, 2024
|
|
CUDA OOM error when `ignore_mismatched_sizes` is enabled
|
|
0
|
232
|
May 31, 2024
|
|
Why activations memory is computed through an experiment rather formulating it for DeepSpeed autotuner
|
|
0
|
91
|
May 6, 2024
|
|
I cannot find the code that transformers trainer model_wrapped by deepspeed , i can find the theory about model_wrapped was wraped by DDP(Deepspeed(transformer model )) ,but i only find the code transformers model wrapped by ddp, where is the deepspeed wr
|
|
1
|
150
|
May 1, 2024
|
|
Model Parallism
|
|
0
|
193
|
April 21, 2024
|
|
What should I do if I want to use model from DeepSpeed
|
|
5
|
1660
|
April 6, 2024
|
|
[Maybe Bug] When using EarlyStopping Callbacks with Seq2SeqTraininer, training didn't stop
|
|
3
|
1598
|
April 4, 2024
|
|
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error
|
|
0
|
1236
|
March 30, 2024
|
|
Deepspeed zero-2 cpu offloading killing process = -9 error
|
|
1
|
1875
|
March 17, 2024
|
|
Conceptual question: Early loading of the model defeats the purpose of deepspeed!
|
|
0
|
164
|
March 14, 2024
|
|
Struggle with finetuneing flan-t5-xxl using deepspeed
|
|
3
|
866
|
March 12, 2024
|
|
Deepspeed inference stage 3 + quantization
|
|
0
|
1032
|
March 8, 2024
|
|
Saving checkpoint is too slow with deepspeed
|
|
5
|
3003
|
March 6, 2024
|
|
Deepspeed trainer and custom loss weights
|
|
1
|
578
|
February 28, 2024
|
|
How can I use Inference API with my model?
|
|
0
|
149
|
February 24, 2024
|
|
Finetune LLM with DeepSpeed
|
|
2
|
5175
|
February 22, 2024
|
|
DeepSpeed integration for HuggingFace Seq2SeqTrainingArguments
|
|
0
|
1551
|
February 22, 2024
|
|
It says that `bfloat16.enabled` without `auto' needed to be specified when training T5, is anyone aware of how to solve that?
|
|
0
|
272
|
February 20, 2024
|
|
Exact difference between Transformers' and Accelerate's DeepSpeed integrations?
|
|
5
|
866
|
February 13, 2024
|
|
How to use GPU when using transformers.AutoModel
|
|
0
|
1802
|
February 3, 2024
|
|
Multi GPU training - Model parallelism
|
|
1
|
1937
|
February 2, 2024
|
|
More processes than GPUs with DeepSpeed launcher
|
|
0
|
242
|
January 25, 2024
|
|
Rewrite trainer's save_model method get unexpected pytorch_model.bin file
|
|
0
|
415
|
January 8, 2024
|
|
Model (Pipeline) Parallelism in SLURM cluster
|
|
0
|
255
|
January 6, 2024
|
|
Mixtral bad FP16 performance
|
|
0
|
533
|
January 3, 2024
|
|
Deepspeed script launcher vs accelerate script launcher for TRL
|
|
0
|
375
|
December 25, 2023
|
|
Best practice to run DeepSpeed
|
|
2
|
1587
|
December 25, 2023
|
|
Infrence time increase when using multi-GPU
|
|
1
|
889
|
November 28, 2023
|