XLNet from Scratch

I have been trying to train the Huggingface XLNet from scratch with my data. Initially with the default parameters but even with the very little data(5000 entries), it is crashing the runtime.
here is the training loop:

epochs = 1
step = 0
loss_lst = []

for epoch in range(epochs):

    loop = tqdm(dataloader, leave=True)
    for batch in loop:

        optim.zero_grad()

        input_ids = batch['input_ids']#.to(device)
        attention_mask = batch['attention_mask']#.to(device)
        labels = batch['labels']#.to(device)
    
        outputs = model_XLNet(input_ids, attention_mask=attention_mask)
        

        loss = outputs.loss
        loss.backward()
        optim.step()
        loop.set_description(f'Epoch {epoch}')
        loop.set_postfix(loss=loss.item())
        

        if step % 5 == 0:
                print('step: {} loss: {:2f}'.format(step, loss.item()))
                loss_lst.append(loss)        

            
        step += 1

Also, it would be great if someone can direct me to the proper resources from where I can understand how I can implement XLNet from scratch.

Thank you