Set TPU device in Trainer

Hi,

I want to use TPU provided by Kaggle in my project. I use PyTorch XLA to do that

import torch_xla
import torch_xla.core.xla_model as xm    
device = xm.xla_device()

Then I define a model

model = AutoModelForMaskedLM.from_pretrained("xlm-roberta-base")

And as I can see model on xla device, it fine

model.device # device(type='xla', index=1)

Then I define Trainer instance with my model

trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

And train it

trainer.train()

But it seems to me that trainer does not use xla device for TPU device is idle in Kaggle…

So, how to use TPU device in Kaggle with PyTorch XLA in my case?

No, the trainer does not support training with TPUS inside colab. You have to use it in a script (like the example scripts) and launch training with our launcher (see here for the instructions).

1 Like

Thank you very much!

Firstly, I have tried to do the example, but it does not work… I do it in Kaggle notebook (is it correct?) and the session stops because of memory use limit

How much memory I am supposed to have? Is Kaggle okay for it?

I don’t see which memory you have maxed out on your screenshot. Is it the TPU memory? That seems unlikely for this example and that batch size.

Probably it is Kaggle thing question, but here is another screenshot. It looks to me that RAM is just can’t handle it.