Accelerate not performing distributed training

I’m following this tutorial
https://huggingface.co/docs/transformers/accelerate
in order to perform distributed training on various g5 sagemaker instances, I’ve refactored my code in the same manner as the tutorial shows. No matter how big the G5 instance, or how many instances I utilize when I run my sagemaker training job, it always fails when the max amount of tokens is greater than 18. Below is an example code snippet

accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)#, torch_dtype="auto")
tokenizer.pad_token = tokenizer.eos_token
optimizer = AdamW(nli_model.parameters(),
                          lr = learning_rate, # previous 8e-6
                          eps = 1e-8 # args.adam_epsilon  - default is 1e-8.
                        )
scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = 0, # Default value in run_glue.py
                                            num_training_steps = 1)

toks = tokenizer(sentences, questions, padding='longest')
    
ds = Dataset.from_dict({"x": torch.tensor(toks['input_ids']), "mask": torch.tensor(toks['attention_mask']), 'labels' : torch.tensor([ 0 if i == 'no' else 1 for i in yes_or_no])}).with_format("torch")

dataloader = DataLoader(ds, batch_size=batch_count)

nli_model, optimizer, dataloader, scheduler = accelerator.prepare(nli_model, optimizer, dataloader, scheduler)
nli_model.train()
for batch in dataloader:
    x_batch = batch["x"]#.to(device)
    mask_batch = batch["mask"]#.to(device)
    labels_batch = batch["labels"]#.to(device)
    loss = nli_model(x_batch, attention_mask=mask_batch, labels = labels_batch)[0]
    accelerator.backward(loss)
    #loss.backward()
    #print(nli_model.model.device)
    #print(loss)
    optimizer.step()
    scheduler.step()
    optimizer.zero_grad()

Is there a reason multiple GPU’s are never being used on these sagemaker training jobs?

How are you calling your script? Just doing python myscript.py will not work

Here is how I’m starting the code

pt_estimator = PyTorch(
    entry_point="ph_1_5_with_accelerator.py",
    source_dir='source_dir_phi_1_5',
    role=get_execution_role(),
    framework_version="1.10.2",
    py_version="py38",
    instance_count=1,
    instance_type="ml.g5.16xlarge",
    distribution={
        "pytorchddp": {
            "enabled": True   # I've also hashtagged distribution out
        }
    }
)

pt_estimator.fit()