I am migrating my single GPU codes to Accelerate.
The model looks like:
class Net(nn.Module):
def __init__(self, config, vocab, dev):
super(Net, self).__init__()
self.tok_emb = nn.Embedding(n_vocab, config.n_embd)
self.drop = nn.Dropout(config.d_dropout)
...
In a train_loop function, I have to use .module
, as the following code
def train_loop(train_dataloader, model, loss_fn, optimizer, accelerator):
size = len(train_dataloader.dataset)
for batch, (masked_array, masked_labels) in tqdm(enumerate(train_dataloader), total = size//8000, leave = False):
idxl = masked_array
targetsl = masked_labels
loss = 0
loss_tmp = 0
for chunk in range(len(idxl)):
idx = idxl[chunk]
targets = targetsl[chunk]
b_element_size = len(idx)
b, t = idx.size()
# forward the model
token_embeddings = model.module.tok_emb(idx)
x = model.module.drop(token_embeddings)
...
If I use token_embeddings = model.tok_emb(idx)
instead of token_embeddings = model.module.tok_emb(idx)
, Accelerate launch will crash with the error AttributeError: 'DistributedDataParallel' object has no attribute 'tok_emb'
.
My question is why I have to add .module
? I cannot find the relevant documentation from Accelerate website. Will adding .module
negatively impact the training speed?