DeepSpeed

Topic	Replies	Views	Activity
Unable to train model (Loss is 0.000000)	2	1085	October 17, 2023
Pix2struct based model ddp code conversion	1	310	October 11, 2023
Speed up beam search for item generation	1	934	October 4, 2023
Hi why its not working?	6	3032	August 24, 2023
Learning rate with deepspeed is fixed despite lr set to auto	2	2138	September 6, 2023
Model connection timed out, even on simple requests	0	304	August 31, 2023
How does from_pretrained work with ZeRO=3?	0	673	August 14, 2023
ZeRO3 with int8 training	0	866	August 11, 2023
How to add java_home in HF space(spark + llama)	0	366	August 8, 2023
ZeRO uses more RAM than DDP?	0	1008	August 7, 2023
Eval_batch_size VS per_device_eval_batch_size	0	878	August 4, 2023
AttributeError: 'ORTTrainingArguments' object has no attribute 'deepspeed_plugin'	0	495	August 1, 2023
NCCL timeout + corrupts checkpoint/latest	1	2534	July 31, 2023
RuntimeError: tensors must be contiguous when finetuning GPT-J-6B using PEFT Lora	0	866	July 29, 2023
Deepspeed inference and infinity offload with bitsandbytes 4bit loaded models	2	3828	July 27, 2023
Parallelizing huggingface models	0	347	July 24, 2023
Fine-tuning a 16B CodeGen model with 256GB RAM+2xA6000s?	2	1642	July 3, 2023
Estimate training compute for 150B LLM	0	529	June 30, 2023
No module named 'deepspeed.checkpoint.utils'	6	2071	June 28, 2023
Difference between using the Trainer class vs Accelerate library	0	896	June 27, 2023
How to use Whisper from huggingface for ASR	0	535	June 21, 2023
Trainer) training one batch with multiple GPUs	0	384	June 19, 2023
Multi-GPU sharded eval with Trainer and generate method during training	1	758	May 25, 2023
How do you know which parameter is used for ZeRO?	0	247	May 24, 2023
How to Create one Process But Using Multi GPU?	0	713	May 15, 2023
DeepSpeed config file not found	0	600	May 13, 2023
Use decoder_input_ids with deepspeed	0	269	May 9, 2023
Is it true that Deepspeed currently does not support regression tasks and only supports softmax-based classification tasks?	0	274	April 21, 2023
[Question] How to generate a merge file and a vocab file	0	360	April 17, 2023
Does anyone have working code for training T5-11B on multi-gpu?	4	1044	March 30, 2023