Save Data from Streamlit Session to Persist Changes to HF Datasets

I noticed a couple spaces where persistence appears to work for saving new records to a CSV file in HF Datasets.

I tried to reproduce it but it does not appear to save however.

Can you provide insight on how to use persistence where you make a modification to data, then want to write back from cached version in memory to persistent shared Dataset?

Here was my last attempt:

Here are my last 3 failed attempts as Datasets:

These two spaces appear to have it working yet I cannot see how - Is it a key/secret or something?

3 Likes

maybe @abidlabs or @osanseviero can help! That would be cool to document how to do this in Streamlit in addition to Gradio

To get it working, one option is to use dataset library with it’s push_to_hub method.

app.py · julien-c/persistent-data at main and app.py · chrisjay/afro-speech at main are examples that work using huggingface_hub (a Python library that works as a wrapper of Hugging Face Hub Public APIs) using the Repository method.

  1. Create or clone a repo using Repository app.py · julien-c/persistent-data at main. These methods use a token HF_TOKEN which is passed as a secret from the Hub. Note that they also specify a local directory.
  2. Save your data in the directory from above. E.g. the first space is appending the data to a csv.

huggingface_hub also has an upload_file method which might be more intuitive which just uploads one file at a time to a given dataset, see app.py · chrisjay/afro-speech at main.