Tutorials for using Colab TPUs with Huggingface Transformers?

facehugger2020 · November 10, 2020, 8:01pm

I looking for an easy-to-follow tutorial for using Huggingface Transformer models (e.g. BERT) in PyTorch on Google Colab with TPUs. I found guides about XLA, but they are largely centered around TensorFlow.

Any help would be appreciated.

thomwolf · November 12, 2020, 12:55pm

There are a few contributed notebooks which might help you here: https://github.com/huggingface/transformers/tree/master/notebooks

For instance this one by @valhalla is about training T5 on TPU in PyTorch: https://colab.research.google.com/github/patil-suraj/exploring-T5/blob/master/T5_on_TPU.ipynb#scrollTo=QLGiFCDqvuil

facehugger2020 · December 6, 2020, 11:21pm

I still cannot get any HuggingFace Tranformer model to train with a Google Colab TPU.

I tried out the notebook mentioned above illustrating T5 training on TPU, but it uses the Trainer API and the XLA code is very ad hoc.

I also tried a more principled approach based on an article by a PyTorch engineer.

My understanding is that using the GPU is simply a matter of creating a variable device and assigning it cuda, like this:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

And then you would move your model and your tensors to the device.

model = AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
model = model.to(device)


# Training loop:
    for i, batch in enumerate(dl):

        optimizer.zero_grad()

        input_ids      = batch['input_ids'].to(device)
        attention_mask = batch['attention_mask'].to(device)
        y              = batch['y'].to(device)

        result = model(input_ids=input_ids, attention_mask=attention_mask, return_dict=True)

To use a TPU, the article above mentioned creating an analogous device variable but setting it to use XLA.

import torch_xla
import torch_xla.core.xla_model as xm

device = xm.xla_device()

model = model.to(device)

I tried this ostensibly straight-forward approach but when I run training, it’s running extremely slowly, practically at the same speed as CPU-only training.

Here is my Google Colab notebook with my attempt. It runs well with GPU, but exceedingly slowly with TPU.

sgugger · December 7, 2020, 2:31pm

At a first glance, you have loss.item() in your training loop, which you should absolutely avoid on TPUs (it’s a big slowdown). You should use loss.detach() to accumulate your losses on the TPU then only do the .item() at the very end of your epoch.

facehugger2020 · December 7, 2020, 8:58pm

Thanks. I did as you suggested, but the training loop is still making very slow progress.

OLD:

    epoch_loss = 0.0
    for i, batch in enumerate(dl):

        loss  = loss_fn(yhat, y)
        loss.backward()

        epoch_loss += loss.item()

    return epoch_loss/len(dl)

NEW:

    epoch_loss = 0.0
    for i, batch in enumerate(dl):

        loss  = loss_fn(yhat, y)
        loss.backward()

        epoch_loss += loss.detach() # <-- NEW

    return epoch_loss.item()/len(dl) # <-- NEW

One batch is still taking a long time to complete. I suspect it’s running on the CPU rather than the TPU. However, I think I followed all XLA setup correctly. If this issue is out of Transformers’ domain, I’ll go bug the XLA folks.

sgugger · December 7, 2020, 9:59pm

Mmmm, I don’t see the call to the spawn function, so yes, you’re probably training on CPU. Normally, you are supposed to called train through

import torch_xla.distributed.xla_multiprocessing as xmp
xmp.spawn(train, args=potential_args, nprocs=num_tpu_cores)

facehugger2020 · December 7, 2020, 10:53pm

My understanding from reading the PyTorch XLA documentation is that xmp.spawn() is used for multi-TPU processing. For single-TPU training, you only need to define device correctly. The difference is also shown in the PyTorch example code for single-core AlexNet and multi-core AlexNet training.

At this time, I’m interested in just single-TPU execution.

facehugger2020 · December 7, 2020, 11:15pm

I filed a Github issue with the XLA team.

tillfurger · February 15, 2021, 8:30pm

Any update on this?

finiteautomata · April 12, 2021, 7:09pm

I’ve succesfully trained a NLI model using colab’s TPUs. I had to struggle a little bit with configuring the environment but thanks to the help of the fellows of huggingface I was able to do it.

You can check it here:

PremalMatalia · June 14, 2021, 5:38pm

I can see an error while training in your colab notebook…was it really successful?

finiteautomata · June 17, 2021, 2:53pm

@PremalMatalia: you’re right. I couldn’t identify the source of error (perhaps a memory limit on Colab?).

BTW, I reworked the notebook because it had some sub-optimal stuff (for example, preprocessing the whole dataset for each fork). Also, I added some explanations. Now it works ok and it’s faster than before, and I hope it’s more readable.

Check it back:

GenV · February 15, 2022, 4:05pm

@finiteautomata I’m trying this code now but I don’t see any prints, I don’t know if it’s working or not. Is there a way to see a percentage of the training? Thanks

finiteautomata · February 15, 2022, 5:35pm

Hum, there should be tqdm bars all around the training. Perhaps it didn’t even start in case you see nothing.

By the way, a couple of months after this, I suggest going for a different task to learn how TPUs work. NLI is by no means a good example of this. Pre-training or fine-tuning a language model is a good use case for this hardware. Check run_mlm.py example, it can be easily adapted using the same ideas of my former notebook.

GenV · February 16, 2022, 1:17pm

@finiteautomata I have opened a thread for this and I’m running an official code → link, my problem is that TPU is very very slow on google colab and I don’t know why.

JKMO · December 8, 2022, 2:12pm

how does one learn how exactly one can use and train from notebook?

KraoESPfan1n · June 3, 2024, 1:57am

but how can I run a current model, not train, use a model with TPU v2

Topic		Replies	Views
How to use TPU for BERT training Colab Beginners	1	957	July 30, 2022
Trainer with Google Colab TPU? Beginners	0	653	April 25, 2022
How to use TPU for model training using example script run_mlm.py Beginners	4	1373	June 18, 2022
When can we expect TPU Trainer? 🤗Transformers	4	4060	March 3, 2022
Set TPU device in Trainer Beginners	5	2613	October 15, 2024

Tutorials for using Colab TPUs with Huggingface Transformers?

Related topics