Hi!
I have an IterableDataset created using streaming, and I want to compute the mean of a feature named ānum_tokens.ā Itās a huge dataset that doesnāt fit in memory, so converting it to a Pandas DataFrame is apparently not an optionā¦
Iāve been reading that this could be accomplished using .map(), but I havenāt been able to do it.
I also want to graph this column in a histogram using something like this:
sns.displot(data[ānum_tokensā], bins=100, kde=True)
Is this even possible?
Thank you very much in advance!