Running model.generate() in deep speed training
|
|
0
|
233
|
November 19, 2023
|
The same hyperparameters with deepspeed is worse than without deepseepd
|
|
2
|
354
|
November 13, 2023
|
Deepspeed stage 3 partition
|
|
0
|
340
|
October 31, 2023
|
Unable to train model (Loss is 0.000000)
|
|
2
|
806
|
October 17, 2023
|
Pix2struct based model ddp code conversion
|
|
1
|
223
|
October 11, 2023
|
Speed up beam search for item generation
|
|
1
|
744
|
October 4, 2023
|
Hi why its not working?
|
|
6
|
2328
|
August 24, 2023
|
Learning rate with deepspeed is fixed despite lr set to auto
|
|
2
|
1274
|
September 6, 2023
|
Model connection timed out, even on simple requests
|
|
0
|
234
|
August 31, 2023
|
RuntimeError: Error building extension 'cpu_adam'
|
|
3
|
3602
|
August 30, 2023
|
How does from_pretrained work with ZeRO=3?
|
|
0
|
465
|
August 14, 2023
|
ZeRO3 with int8 training
|
|
0
|
474
|
August 11, 2023
|
How to add java_home in HF space(spark + llama)
|
|
0
|
294
|
August 8, 2023
|
ZeRO uses more RAM than DDP?
|
|
0
|
604
|
August 7, 2023
|
Eval_batch_size VS per_device_eval_batch_size
|
|
0
|
563
|
August 4, 2023
|
AttributeError: 'ORTTrainingArguments' object has no attribute 'deepspeed_plugin'
|
|
0
|
361
|
August 1, 2023
|
NCCL timeout + corrupts checkpoint/latest
|
|
1
|
1716
|
July 31, 2023
|
RuntimeError: tensors must be contiguous when finetuning GPT-J-6B using PEFT Lora
|
|
0
|
576
|
July 29, 2023
|
Deepspeed inference and infinity offload with bitsandbytes 4bit loaded models
|
|
2
|
2674
|
July 27, 2023
|
Parallelizing huggingface models
|
|
0
|
230
|
July 24, 2023
|
Fine-tuning a 16B CodeGen model with 256GB RAM+2xA6000s?
|
|
2
|
1430
|
July 3, 2023
|
Estimate training compute for 150B LLM
|
|
0
|
368
|
June 30, 2023
|
No module named 'deepspeed.checkpoint.utils'
|
|
6
|
1382
|
June 28, 2023
|
Difference between using the Trainer class vs Accelerate library
|
|
0
|
675
|
June 27, 2023
|
How to use Whisper from huggingface for ASR
|
|
0
|
425
|
June 21, 2023
|
Trainer) training one batch with multiple GPUs
|
|
0
|
260
|
June 19, 2023
|
Multi-GPU sharded eval with Trainer and generate method during training
|
|
1
|
616
|
May 25, 2023
|
How do you know which parameter is used for ZeRO?
|
|
0
|
211
|
May 24, 2023
|
How to Create one Process But Using Multi GPU?
|
|
0
|
523
|
May 15, 2023
|
DeepSpeed config file not found
|
|
0
|
441
|
May 13, 2023
|