Clarification on Dataset Size Discrepancy – Common Pile v0.1

3

Possible.

2

If it’s gated, I think you can see the size. If it’s private, no one else can see it, including me. If it’s a dataset repository that uses a dataset builder script, I think there may be some with small sizes, but from what I’ve seen, most seem to be repositories that just store the data itself.

1

I think it would be quicker to ask the author.