DeepSpeed

Topic	Replies	Views	Activity
About the DeepSpeed category	1	714	October 30, 2021
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)	4	2227	May 9, 2024
Why activations memory is computed through an experiment rather formulating it for DeepSpeed autotuner	0	42	May 6, 2024
I cannot find the code that transformers trainer model_wrapped by deepspeed , i can find the theory about model_wrapped was wraped by DDP(Deepspeed(transformer model )) ,but i only find the code transformers model wrapped by ddp, where is the deepspeed wr	1	49	May 1, 2024
Model Parallism	0	57	April 21, 2024
What should I do if I want to use model from DeepSpeed	5	1412	April 6, 2024
[Maybe Bug] When using EarlyStopping Callbacks with Seq2SeqTraininer, training didn't stop	3	1134	April 4, 2024
ncclSystemError: System call (e.g. socket, malloc) or external library call failed or device error	0	338	March 30, 2024
Deepspeed zero-2 cpu offloading killing process = -9 error	1	864	March 17, 2024
Conceptual question: Early loading of the model defeats the purpose of deepspeed!	0	97	March 14, 2024
Struggle with finetuneing flan-t5-xxl using deepspeed	3	641	March 12, 2024
Deepspeed inference stage 3 + quantization	0	312	March 8, 2024
Saving checkpoint is too slow with deepspeed	5	1324	March 6, 2024
Deepspeed trainer and custom loss weights	1	382	February 28, 2024
How can I use Inference API with my model?	0	92	February 24, 2024
Finetune LLM with DeepSpeed	2	4035	February 22, 2024
DeepSpeed integration for HuggingFace Seq2SeqTrainingArguments	0	515	February 22, 2024
It says that `bfloat16.enabled` without `auto' needed to be specified when training T5, is anyone aware of how to solve that?	0	122	February 20, 2024
Exact difference between Transformers' and Accelerate's DeepSpeed integrations?	5	463	February 13, 2024
How to use GPU when using transformers.AutoModel	0	601	February 3, 2024
Multi GPU training - Model parallelism	1	1379	February 2, 2024
More processes than GPUs with DeepSpeed launcher	0	133	January 25, 2024
LoRA training with accelerate / deepspeed	0	622	January 22, 2024
Rewrite trainer's save_model method get unexpected pytorch_model.bin file	0	197	January 8, 2024
Model (Pipeline) Parallelism in SLURM cluster	0	158	January 6, 2024
Mixtral bad FP16 performance	0	401	January 3, 2024
Deepspeed script launcher vs accelerate script launcher for TRL	0	263	December 25, 2023
Best practice to run DeepSpeed	2	1259	December 25, 2023
Codellama will not stop generating at EOS	0	375	December 20, 2023
Infrence time increase when using multi-GPU	1	786	November 28, 2023