🤗Accelerate

Topic	Replies	Views	Activity
What is the right way to save check point using accelerator while trainining on multiple gpus?	2	1913	January 24, 2024
Huggingface Seq2SeqTrainer uses accelerate so it cannot be run with DDP?	1	556	January 24, 2024
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 3 (pid: 10561) of binary	4	4837	January 24, 2024
Accelerate FSDP shows "Removed shared tensor {'model.norm.weight'} while saving."	2	1947	January 24, 2024
FSDP accelerate.prepare gives OOM. How to load model into single GPU, then distribute shards?	2	1108	January 24, 2024
When a tensor is generated from some_func(A.shape) (where A is a tensor), the generated tensor locates in cpu, not A's device	1	230	January 24, 2024
torch.Size([0]) on some layers when using Accelerate	2	684	January 24, 2024
How does compute/resource allocation work for multi-node hypeparameter search?	0	187	January 23, 2024
Setting optimizer parameters with DeepSpeed	0	610	January 22, 2024
"Out of memory" when loading quantized model	1	1372	January 22, 2024
Docs Clarification: Is prepare() inefficient for models that are frozen?	0	196	January 22, 2024
Is the trainer DDP or DP?	0	288	January 19, 2024
How to unload an adapter in PEFT?	2	3421	January 15, 2024
DataLoader from accelerator samples from beginning of dataset for last batch	1	661	January 15, 2024
Worse performance using Accelerate	0	1049	January 15, 2024
How to load a checkpoint model with SHARDED_STATE_DICT?	5	1916	January 11, 2024
Issue with accelerator.backward(loss) freezing	0	530	January 6, 2024
How to check whether the communication between multi-nodes is working well?	1	358	January 5, 2024
Hugging face accelerate and torch DDP crash with out-of-memory errors for a model runs fine on a single GPU	3	4450	January 1, 2024
Accelerate stalls when using Tensor Dataset	0	313	December 31, 2023
No GPUs found in a machine definitely with GPUs	8	7681	December 27, 2023
Accelerate FSDP training \|\| RuntimeError : Forward oder differ across ranks	0	457	December 19, 2023
Getting mpi4py Error When Trying to Integrate Accelerate	2	865	December 12, 2023
SDXL Finetuning Script Not Working	1	388	December 10, 2023
How to collect the accuracy when running multi GPU model with accelerate?	3	979	December 8, 2023
Accelerate - video encoding across GPUs fails	0	193	December 5, 2023
Multi Node GPU: `connecting to address with family 7299 is neither AF_INET(2) nor AF_INET6(10)`	1	674	December 2, 2023
ValueError: weight is on the meta device when using Auto Model For Sequence Classification	2	1979	November 30, 2023
Distributed GPU training not working	2	4500	November 30, 2023
Any good code/tutorial that is shows how to do inference with Llama 2 70b on multiple GPUs with accelerate?	1	2770	November 27, 2023