Set_format('torch') returns lists of tensors for multiple-entries sample

In my experiment, I concatenate one question with each of 10 possible answers to generate a QA pair so that a language modeling model can directly evaluate the perplexity of each answer. Therefore, each item in a dataset has a input_ids of shape 10(n_answer) * 64(seq length).
However, when I use datasets.set_format('torch'), each item becomes a list of tensor of shape (64,).
Is there a way to let it return a matrix?

The code to reproduce the problem:

import torch
from datasets import load_dataset
from transformers import GPT2Tokenizer

def tokenize_func(examples, tokenizer):
    # Create question answer pair for every choice
    choices = examples['choices']
    duplicated_question = [examples['question']] * len(choices)
    tokenized = tokenizer(duplicated_question,
                          padding='max_length',  # batch size has to be 1
    return tokenized

tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
val_dataset = load_dataset('drt/kqa_pro', 'train_val', split='validation[:1%]')
val_dataset =, fn_kwargs={'tokenizer': tokenizer})
val_dataset = val_dataset.remove_columns(['question', 'sparql', 'program', 'choices', 'answer'])

Moreover, even if I manurally stack the list of tensors, the items remain to be a list of tensor

val_dataset = example: {k: torch.stack(v) for k, v in example.items()})

This looks very weird for me.

Hi ! Which version of datasets are you using ? Can you try to update to the latest version ?

It works like a magicā€¦
It was version 2.4.0