Set TPU device in Trainer

Constantin · March 26, 2021, 5:56am

Hi,

I want to use TPU provided by Kaggle in my project. I use PyTorch XLA to do that

import torch_xla
import torch_xla.core.xla_model as xm    
device = xm.xla_device()

Then I define a model

model = AutoModelForMaskedLM.from_pretrained("xlm-roberta-base")

And as I can see model on xla device, it fine

model.device # device(type='xla', index=1)

Then I define Trainer instance with my model

trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

And train it

trainer.train()

But it seems to me that trainer does not use xla device for TPU device is idle in Kaggle…

So, how to use TPU device in Kaggle with PyTorch XLA in my case?

sgugger · March 26, 2021, 12:53pm

No, the trainer does not support training with TPUS inside colab. You have to use it in a script (like the example scripts) and launch training with our launcher (see here for the instructions).

Constantin · March 26, 2021, 7:48pm

Thank you very much!

Firstly, I have tried to do the example, but it does not work… I do it in Kaggle notebook (is it correct?) and the session stops because of memory use limit

How much memory I am supposed to have? Is Kaggle okay for it?

sgugger · March 26, 2021, 7:59pm

I don’t see which memory you have maxed out on your screenshot. Is it the TPU memory? That seems unlikely for this example and that batch size.

Constantin · March 27, 2021, 7:51am

Probably it is Kaggle thing question, but here is another screenshot. It looks to me that RAM is just can’t handle it.

saeed899 · October 15, 2024, 6:17am

@sgugger
recently Kaggle replaced TPU by TPU VMs.
but still problem

Topic		Replies	Views
Struggle with training on TPU using 'accelerate' library 🤗Accelerate	3	1710	March 7, 2022
🤗Transformer with Trainer API on TPU VMs and TPU Pods Beginners	0	408	December 18, 2023
Trainer with TPUs Beginners	3	2773	April 13, 2022
When can we expect TPU Trainer? 🤗Transformers	4	4052	March 3, 2022
TPU trainer with multi-core Intermediate	5	2205	April 21, 2022

Set TPU device in Trainer

Related topics