|
About the DeepSpeed category
|
|
1
|
805
|
October 30, 2021
|
|
Prakash Hinduja, Geneva (Swiss) How can I ask effective technical questions on the Hugging Face forum?
|
|
1
|
33
|
August 4, 2025
|
|
AttributeError: 'ORTTrainingArguments' object has no attribute 'deepspeed_plugin'
|
|
2
|
511
|
August 2, 2025
|
|
Timeout Issue with DeepSpeed on Multiple GPUs
|
|
2
|
674
|
July 21, 2025
|
|
Use ReduceLROnPlateau with deepspeed
|
|
4
|
49
|
June 26, 2025
|
|
How to use different learning rates when deepspeed enabled
|
|
1
|
46
|
June 14, 2025
|
|
LoRA training with accelerate / deepspeed
|
|
3
|
2623
|
May 28, 2025
|
|
Incorrect total train batch size when using tp_size > 1 and deepspeed
|
|
1
|
122
|
May 20, 2025
|
|
Error using deepspeed for sftconfig
|
|
1
|
73
|
April 21, 2025
|
|
Deepspeed zero3 does not work with Diffusion Models. Does anyone know how to fix this?
|
|
1
|
2403
|
April 18, 2025
|
|
Corrupted deepspeed checkpoint
|
|
1
|
220
|
March 13, 2025
|
|
SFTTrainer Doubling Speed on a Single GPU with DeepSpeed: Proposal for an Update to the Official Documentation and Verification Report
|
|
1
|
89
|
March 7, 2025
|
|
Accelerator.backward freeze
|
|
1
|
86
|
February 24, 2025
|
|
Deepspeed ZeRO-3 flattens convolution that causes runtime error
|
|
0
|
238
|
February 17, 2025
|
|
Is there a way to terminate llm.generate and release the GPU memory for next prompt?
|
|
1
|
245
|
February 4, 2025
|
|
CUDA OOM on first backward pass after evaluation
|
|
0
|
312
|
November 20, 2024
|
|
Different metrics score between when training and when merge lora adapter testing
|
|
1
|
157
|
October 25, 2024
|
|
Trainer leaked memory?
|
|
1
|
803
|
October 15, 2024
|
|
DeepSpeed MII pipeline issue
|
|
1
|
43
|
September 30, 2024
|
|
Deepspeed mii library issues
|
|
1
|
90
|
September 29, 2024
|
|
Calculate tokens per second while fine-tuning llm?
|
|
0
|
154
|
September 17, 2024
|
|
Fitting huge models on multiple nodes
|
|
0
|
195
|
September 6, 2024
|
|
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
|
|
5
|
3489
|
August 26, 2024
|
|
AutoTrain Error DeepSpeed Zero-3
|
|
1
|
310
|
August 21, 2024
|
|
DeepSpeed Zero 3 with LoRA - Merging adapters
|
|
1
|
807
|
August 16, 2024
|
|
DeepSpeed error: a leaf Variable that requires grad is being used in an in-place operation
|
|
1
|
89
|
July 26, 2024
|
|
Running model.generate() in deep speed training
|
|
2
|
564
|
July 25, 2024
|
|
RuntimeError: Error building extension 'cpu_adam'
|
|
4
|
5319
|
July 23, 2024
|
|
Saving checkpoints when using DeepSpeed is taking abnormally long
|
|
0
|
199
|
July 22, 2024
|
|
GPU memory usage of optimizer's states when using LoRA
|
|
4
|
871
|
July 5, 2024
|