What should I do if I want to use model from DeepSpeed

ezio98 · September 23, 2021, 6:41am

I am trying the language model pre-trained by using run_mlm.py. I want to add the mixture of expert (MoE) integrated with the original Bert-base model. Specifically, I reuse the MoELayer implemented by DeepSpeed and add it to the BertForMaskedLM. From the document of DeepSpeed, I find the training of DeepSpeed requires to call some functions like this:

model_engine, optimizer, _, _ = deepspeed.initialize(args=cmd_args,
                                                     model=net,
                                                     model_parameters=net.parameters())

However, I just want to reuse the MoE model implemented by DeepSpeed and maintain the training behaviors of Huggingface. Currently, I ignore calling this function and directly pass the language model (with DeepSpeed model) into the Trainer(). Although it runs successfully, my question is that does this incurs some potential dangers?

rgwatwormhill · September 24, 2021, 1:52pm

Hi ezio98.

I can’t answer your question, but I’m a bit confused. From what I have read about the MoE layer, the point of it is to facilitate the use (Mixture) of many different models (Experts) concurrently, but you say you are using the MoE layer on top of a single BertForMaskedLM model.

What are you hoping the MoE layer will do for you? Does it have some other advantages?

ezio98 · September 25, 2021, 12:37am

Hi, rgwatwormhill. Sorry For I didn’t mention it clearly. Actually, I combine the MoE with the FFN module, following the design of Switch Transformer. This paper claimed this design can accelerate the training efficiency. Even though I haven’t observed in my experiments…

ireneg · September 7, 2022, 6:16am

Hi,

I also plan to extend the huggingface transformer with deepspeed MoE. Do you have any successful experience in doing so? Can the models run and train successfully by simply adding the MoE layer to the FFN layer?

ezio98 · September 8, 2022, 5:59am

Not yet. But at this stage those libraries are quite different. You can try this now.

FrozenWolf · April 6, 2024, 11:29am

Hey, were you able to succeed in doing so?

Topic		Replies	Views
Best practice to run DeepSpeed DeepSpeed	2	1557	December 25, 2023
How to run hf MoE series model in an expert parallel manner? 🤗Transformers	0	335	April 7, 2024
How can I use Inference API with my model? DeepSpeed	0	146	February 24, 2024
Saving underlying language model after trained on downstream task 🤗Transformers	0	420	September 14, 2020
Using Huggingface Trainer for custom models Beginners	5	4335	May 29, 2023

What should I do if I want to use model from DeepSpeed

Related topics