Following nontrainer-deepspeed-integration,
When not using [Trainer], to efficiently deploy DeepSpeed ZeRO-3, you must instantiate the [HfDeepSpeedConfig] object before instantiating the model and keep that object alive.
, I found that params are partitioned in different ranks. However, if I want to resize tokenizer, can I still use resize_token_embeddings
after from_pretrained
?