Tutorials for using Colab TPUs with Huggingface Transformers?

I still cannot get any HuggingFace Tranformer model to train with a Google Colab TPU.

I tried out the notebook mentioned above illustrating T5 training on TPU, but it uses the Trainer API and the XLA code is very ad hoc.

I also tried a more principled approach based on an article by a PyTorch engineer.

My understanding is that using the GPU is simply a matter of creating a variable device and assigning it cuda, like this:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

And then you would move your model and your tensors to the device.

model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model = model.to(device)


# Training loop:
    for i, batch in enumerate(dl):

        optimizer.zero_grad()

        input_ids      = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        y              = batch['y'].to(device)

        result = model(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)

To use a TPU, the article above mentioned creating an analogous device variable but setting it to use XLA.

import torch_xla
import torch_xla.core.xla_model as xm

device = xm.xla_device()

model = model.to(device)

I tried this ostensibly straight-forward approach but when I run training, it’s running extremely slowly, practically at the same speed as CPU-only training.

Here is my Google Colab notebook with my attempt. It runs well with GPU, but exceedingly slowly with TPU.

1 Like