I created a space where the download of the dataset which I convert to a pandas dataframe began downloading on compile.
It ran in the log repeatedly appearing to download multiple times… I believe that part to be actually loading it to different cache or parquet files. Is that correct?
Next when that step completed it appears to copy to an image - I’m thinking that is the Git LFS push image copy of what might be needed to start the space. Is that correct?
Last when it finished the step it says Storage Exceeded - Space Evicted. Makes sense maybe dataset was too large.
What are guidelines on what my max storage is per space for loading a dataset? I know a 2B param Dataset will not work because it exceeds 2 GB and that is Git LFS limit I read.
I am trying a smaller one now. Sorry if still too big. I will know in a few minutes…
If I would like to pay for additional storage for a single space - is there a rate chart for that on what I can pay? Does that go up with the other choices on hardware/GPU when a new space is created? I guess I should read that in detail.
If I want to demonstrate a large file for users at a pro rate - what is maybe max limit I should do? I know the 200MB datasets I have work fine.
Below is my relatively uneducated first attempts source code:
import pandas as pd
import gradio as gr
from datasets import load_dataset
#dataset = load_dataset(“laion/laion2B-en-joined”) too big space evicted
dataset = load_dataset(“laion/laion-coco”) # try smaller?
df = pd.DataFrame(dataset)
Thanks again! You rock as a platform capable of the best way to handle large datasets and models. Any answers will be passed on to many curious students I am teaching on Thursdays… Much appreciated!
Check out the hugging_face.glb in this 2d to 3d space example. This is the only platform that you can do something like this. so cool. 🎨3DfromImg.GLB🎈 - a Hugging Face Space by awacke1