Hi everyone,
I’ve created a Hugging Face Dataset that I’m utilizing within my Hugging Face Spaces applications. I’m looking for a way to automatically restart or update these Spaces whenever modifications are made to the underlying Dataset.
Could anyone guide me on how to set this up? Additionally, are there any alternative methods to achieve this functionality?
1 Like
Hi! This is a common challenge. The most robust way to achieve this is by combining webhooks and GitHub Actions.
- Dataset on GitHub: Ensure your dataset is stored in a GitHub repository (or a similar version control system). This is crucial for triggering automated actions.
- Webhook on Hugging Face: Set up a webhook on your Hugging Face Space that listens for pushes to your dataset’s GitHub repository. You can do this in your Space’s settings under “Repository”. The webhook URL will be something like
https://huggingface.co/api/spaces/<your-username>/<your-space-name>/webhook
.
- GitHub Actions Workflow: Create a GitHub Actions workflow in your dataset’s repository. This workflow should be triggered by
push
events (whenever you update the dataset). Within the workflow, use a curl
command (or a dedicated action like curl-http-request
) to send a POST request to the webhook URL you got from Hugging Face.
Here’s a basic example of the GitHub Actions workflow (.github/workflows/update-space.yml
):
`YAMLname: Update Hugging Face Space
on:
push:
branches: [main] # Or your main branch name
jobs:
update-space:
runs-on: ubuntu-latest
steps:
- name: Trigger Space rebuild
run: |
curl -X POST -H “Content-Type: application/json” -d ‘{}’ https://huggingface.co/api/spaces///webhook`
This setup ensures that whenever you push changes to your dataset on GitHub, the webhook is triggered, and your Hugging Face Space rebuilds, using the updated data.
Alternative methods are less reliable. Directly modifying files within the Space’s repository isn’t recommended as it can lead to conflicts and doesn’t guarantee a rebuild. Using a separate script to periodically check for dataset updates is also less efficient than the webhook approach.
Hope this help!
To @Adityarajpurohit from @Alanturner2 !
1 Like