Hi! I have been trying to have a function that ouputs a list to map it to my dataset. However, when trying to pass the output to a csv or a DataFrame, it appears as a Numpy array instead of as a list. So, if I have the following function:
def fetch_embedding(data):
text = data["text"]
out = trainer.predict(text)
embeddings = out[0][1][-1][:,0,:]
embeddings = embeddings.tolist()
return {"embeddings" : embeddings}
So, I’m very intentionally passing the torch tensor to a list. Then I map it to the dataset to save these embeddings:
dataset = dataset.map(fetch_embedding)
We check whether the dataset stored the list as an actual list:
df = datset.to_pandas()
A = df.iloc[0].loc["embeddings"]
print(type(A))
The output of this is the following:
<class ‘numpy.ndarray’>
Is there any way to actually have the output of the map saved as a list instead of it being passed as a Numpy array?