IterableDataset compute feature mean and create histogram

Hi!

I have an IterableDataset created using streaming, and I want to compute the mean of a feature named ā€œnum_tokens.ā€ It’s a huge dataset that doesn’t fit in memory, so converting it to a Pandas DataFrame is apparently not an option…

I’ve been reading that this could be accomplished using .map(), but I haven’t been able to do it.

I also want to graph this column in a histogram using something like this:

sns.displot(data[ā€˜num_tokens’], bins=100, kde=True)

Is this even possible?

Thank you very much in advance!