How does from_pretrained work with ZeRO=3?

suzhu001 · August 14, 2023, 5:28am

Following nontrainer-deepspeed-integration,

When not using [Trainer], to efficiently deploy DeepSpeed ZeRO-3, you must instantiate the [HfDeepSpeedConfig] object before instantiating the model and keep that object alive.

, I found that params are partitioned in different ranks. However, if I want to resize tokenizer, can I still use resize_token_embeddings after from_pretrained?

Topic		Replies	Views
Deepspeed inference stage 3 + quantization DeepSpeed	0	984	March 8, 2024
Question about using trainer with DeepSpeed 🤗Transformers	0	451	April 25, 2023
[Deepspeed] ZeRO-Infinity integration released and config changes DeepSpeed	2	2295	April 28, 2021
How DeepSpeed interacts with Trainer optimizer DeepSpeed	1	1186	October 13, 2021
Exact difference between Transformers' and Accelerate's DeepSpeed integrations? DeepSpeed	5	809	February 13, 2024

How does from_pretrained work with ZeRO=3?

Related topics