About the DeepSpeed category
|
|
1
|
783
|
October 30, 2021
|
DeepSpeed MII pipeline issue
|
|
1
|
8
|
September 30, 2024
|
Deepspeed mii library issues
|
|
1
|
14
|
September 29, 2024
|
How to specify FSDP config without using Accelerate
|
|
0
|
8
|
September 23, 2024
|
Calculate tokens per second while fine-tuning llm?
|
|
0
|
10
|
September 17, 2024
|
Fitting huge models on multiple nodes
|
|
0
|
19
|
September 6, 2024
|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
|
|
5
|
2900
|
August 26, 2024
|
AutoTrain Error DeepSpeed Zero-3
|
|
1
|
84
|
August 21, 2024
|
DeepSpeed Zero 3 with LoRA - Merging adapters
|
|
1
|
181
|
August 16, 2024
|
LoRA training with accelerate / deepspeed
|
|
2
|
1219
|
August 8, 2024
|
DeepSpeed error: a leaf Variable that requires grad is being used in an in-place operation
|
|
1
|
25
|
July 26, 2024
|
Running model.generate() in deep speed training
|
|
2
|
426
|
July 25, 2024
|
RuntimeError: Error building extension 'cpu_adam'
|
|
4
|
4656
|
July 23, 2024
|
Saving checkpoints when using DeepSpeed is taking abnormally long
|
|
0
|
50
|
July 22, 2024
|
GPU memory usage of optimizer's states when using LoRA
|
|
4
|
263
|
July 5, 2024
|
Saving weights while finetuning is on
|
|
0
|
84
|
June 13, 2024
|
Deepspeed ZeRO2, PEFT, bitsnbytes training
|
|
0
|
97
|
June 4, 2024
|
Codellama will not stop generating at EOS
|
|
1
|
505
|
June 2, 2024
|
CUDA OOM error when `ignore_mismatched_sizes` is enabled
|
|
0
|
131
|
May 31, 2024
|
Why activations memory is computed through an experiment rather formulating it for DeepSpeed autotuner
|
|
0
|
79
|
May 6, 2024
|
I cannot find the code that transformers trainer model_wrapped by deepspeed , i can find the theory about model_wrapped was wraped by DDP(Deepspeed(transformer model )) ,but i only find the code transformers model wrapped by ddp, where is the deepspeed wr
|
|
1
|
114
|
May 1, 2024
|
Model Parallism
|
|
0
|
173
|
April 21, 2024
|
What should I do if I want to use model from DeepSpeed
|
|
5
|
1543
|
April 6, 2024
|
[Maybe Bug] When using EarlyStopping Callbacks with Seq2SeqTraininer, training didn't stop
|
|
3
|
1335
|
April 4, 2024
|
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error
|
|
0
|
842
|
March 30, 2024
|
Deepspeed zero-2 cpu offloading killing process = -9 error
|
|
1
|
1282
|
March 17, 2024
|
Conceptual question: Early loading of the model defeats the purpose of deepspeed!
|
|
0
|
151
|
March 14, 2024
|
Struggle with finetuneing flan-t5-xxl using deepspeed
|
|
3
|
784
|
March 12, 2024
|
Deepspeed inference stage 3 + quantization
|
|
0
|
713
|
March 8, 2024
|
Saving checkpoint is too slow with deepspeed
|
|
5
|
1971
|
March 6, 2024
|