I want to fine-tune an LM using DeepSpeed with some ZeRO stage. However, the setup involves another model which evaluates the LM-in-training and only does inference. Preparing the latter without an associated optimizer throws an error about this ZeRO stage requiring an optimizer. However, for the inference model, I only want to parallelize it across devices without any fancy ZeRO partitioning at all. Is that possible?
I was hoping for being able to pass a custom config dict to
acelerate.prepare(), but there’s no such option. Any advice?
Hello @paulbricman, given that you only want to do inference with second model (eval model), just don’t pass it to the
accelerator.prepare(). That way, eval model has parallel copies on each GPU with same params which can be used for inference without any issues. Let us know if this solves your query. Thank you .
Thanks for the quick reply.
As in, consistently assign the model (I’m using pipelines) to accelerator.device, and the accelerate wrapper will place each in the right place while the other is “prepared”?
Yes, for eval model you would have to do the following
eval_model = eval_model.to(accelerator.device)