Add .module fixed my problem, but confused

zhanxw · March 6, 2024, 3:19pm

I am migrating my single GPU codes to Accelerate.

The model looks like:

class Net(nn.Module):
    def __init__(self, config, vocab, dev):
        super(Net, self).__init__()
        self.tok_emb = nn.Embedding(n_vocab, config.n_embd)
        self.drop = nn.Dropout(config.d_dropout)
        ...

In a train_loop function, I have to use .module , as the following code

def train_loop(train_dataloader, model, loss_fn, optimizer, accelerator):
    size = len(train_dataloader.dataset)
    for batch, (masked_array, masked_labels) in tqdm(enumerate(train_dataloader), total = size//8000, leave = False):
        idxl =     masked_array 
        targetsl = masked_labels
        
        loss = 0
        loss_tmp = 0
        for chunk in range(len(idxl)):
            idx = idxl[chunk]
            targets = targetsl[chunk]
            b_element_size = len(idx)
            b, t = idx.size()
            # forward the model
            token_embeddings = model.module.tok_emb(idx) 
            x = model.module.drop(token_embeddings)
            ...

If I use token_embeddings = model.tok_emb(idx) instead of token_embeddings = model.module.tok_emb(idx) , Accelerate launch will crash with the error AttributeError: 'DistributedDataParallel' object has no attribute 'tok_emb'.

My question is why I have to add .module? I cannot find the relevant documentation from Accelerate website. Will adding .module negatively impact the training speed?

muellerzr · March 7, 2024, 9:38pm

Ideally if you need to do stuff like this, it’s best to leave everything in the .forward() method of your model if possible.

.module is your original model, so you can access it but if it has to do with weight gradients etc it’s not the best thing in the world to not use model.forward() directly.

You may see some slowdowns potentially this way, yes

system · March 21, 2024, 10:23pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Any utility to get the real nn.module for (non-)distributed setting? 🤗Accelerate	1	264	September 29, 2023
Why my Accelerate just doesn't work? 🤗Accelerate	2	6314	March 7, 2022
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss 🤗Accelerate	3	4896	January 24, 2024
How to properly wrap a model for training with accelerate? 🤗Accelerate	1	1327	September 20, 2023
Accelerator object 🤗Accelerate	1	714	May 25, 2021

Add .module fixed my problem, but confused

Related topics