Add .module fixed my problem, but confused

I am migrating my single GPU codes to Accelerate.

The model looks like:

class Net(nn.Module):
    def __init__(self, config, vocab, dev):
        super(Net, self).__init__()
        self.tok_emb = nn.Embedding(n_vocab, config.n_embd)
        self.drop = nn.Dropout(config.d_dropout)
        ...

In a train_loop function, I have to use .module , as the following code

def train_loop(train_dataloader, model, loss_fn, optimizer, accelerator):
    size = len(train_dataloader.dataset)
    for batch, (masked_array, masked_labels) in tqdm(enumerate(train_dataloader), total = size//8000, leave = False):
        idxl =     masked_array 
        targetsl = masked_labels
        
        loss = 0
        loss_tmp = 0
        for chunk in range(len(idxl)):
            idx = idxl[chunk]
            targets = targetsl[chunk]
            b_element_size = len(idx)
            b, t = idx.size()
            # forward the model
            token_embeddings = model.module.tok_emb(idx) 
            x = model.module.drop(token_embeddings)
            ...

If I use token_embeddings = model.tok_emb(idx) instead of token_embeddings = model.module.tok_emb(idx) , Accelerate launch will crash with the error AttributeError: 'DistributedDataParallel' object has no attribute 'tok_emb'.

My question is why I have to add .module? I cannot find the relevant documentation from Accelerate website. Will adding .module negatively impact the training speed?

Ideally if you need to do stuff like this, it’s best to leave everything in the .forward() method of your model if possible.

.module is your original model, so you can access it but if it has to do with weight gradients etc it’s not the best thing in the world to not use model.forward() directly.

You may see some slowdowns potentially this way, yes

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.