Hi there,
I am trying to run a simple forward_pass
function on my dataset producing the following error:
dataset['train'].set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
def forward_pass(batch):
input_ids = torch.tensor(batch['input_ids']).to(device)
attention_mask = torch.tensor(batch['attention_mask']).to(device)
with torch.no_grad():
batch['logits'] = model(input_ids, attention_mask)['logits'].cpu().numpy()
return batch
dataset['train'].map(forward_pass, batched=True, batch_size=16)
TypeError: Provided `function` which is applied to all elements of table returns a `dict` of types [<class 'torch.Tensor'>, <class 'torch.Tensor'>, <class 'torch.Tensor'>, <class 'numpy.ndarray'>]. When using `batched=True`, make sure provided `function` returns a `dict` of types like `(<class 'list'>, <class 'numpy.ndarray'>)`.
The error does not occur when I convert to numpy
instead of torch
:
dataset['train'].set_format(type='numpy', columns=['input_ids', 'attention_mask', 'label'])
Why is that the case? I couldn’t quite wrap my head around why the map
call doesn’t handle the tensor data but is fine with using the numpy arrays? Super grateful for insights on the inner workings of the employed function!
Best
Simon