How to get the Trainer API to use GPU?

I am following this pretrain example, but I always get the Cuda: out of memory error, although I have 2 GPU available with 16GB memory each.

And the code is below, exactly copied from the tutorial:
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification
from transformers import TrainingArguments
from transformers import Trainer
from import DataLoader
from datasets import load_metric

    def tokenize_function(examples):
        return tokenizer(examples["text"], padding="max_length", truncation=True)

    raw_datasets = load_dataset("imdb")
    tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")

    tokenized_datasets =, batched=True)
    small_train_dataset = tokenized_datasets["train"].shuffle(seed=42).select(range(1000))
    small_eval_dataset = tokenized_datasets["test"].shuffle(seed=42).select(range(1000))
    full_train_dataset = tokenized_datasets["train"]
    full_eval_dataset = tokenized_datasets["test"]

    model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased", num_labels=2)
    training_args = TrainingArguments("test_trainer")

    trainer = Trainer(
        model=model, args=training_args, train_dataset=small_train_dataset, eval_dataset=small_eval_dataset


Is there any configuration to use the GPU with the Trainer API? If I use the native version of the PyTorch pretrain tutorial example, the GPU is used correctly.