How to Fix "Workload Evicted, Storage Limit Exceeded (50G)" Error in HuggingFace Spaces

:cross_mark: Workload evicted, storage limit exceeded (50G) :cross_mark: No logs available :cross_mark: Container fails to start

Root Cause:

  • torch.hub.load() downloads entire YOLOv5 repository to permanent storage
  • Model cache saves to default location (counts against 50GB limit)
  • No optimization for cloud deployment

:white_check_mark: Solution 1: Redirect Cache to /tmp (RECOMMENDED)

This solution redirects all downloads and cache to temporary storage that doesn’t count against your limit.

Step 1: Update app.py

Add these lines at the very top of your app.py (before any imports):

import os
import torch

# CRITICAL: Redirect cache to temporary storage
os.environ['TORCH_HOME'] = '/tmp/torch_cache'
os.environ['HUB_DIR'] = '/tmp/torch_hub'
os.environ['TMPDIR'] = '/tmp'
torch.hub.set_dir('/tmp/torch_hub')

Step 2: Add Streamlit Caching

Replace your model loading function with:

import streamlit as st

@st.cache_resource  # Cache model to load only once
def load_model():
    try:
        model = torch.hub.load(
            "ultralytics/yolov5", 
            "custom", 
            path="best.pt", 
            force_reload=False,  # Don't re-download
            trust_repo=True,
            verbose=False,
            skip_validation=True
        )
        model.eval()
        model.conf = 0.25
        return model, None
    except Exception as e:
        return None, str(e)

# Load model once
model, error = load_model()

Step 3: Update requirements.txt

Replace with this minimal, optimized version:

streamlit
pillow
torch
torchvision
opencv-python-headless
numpy
pandas
pyyaml
requests
scipy
tqdm
matplotlib
seaborn
ultralytics

Step 4: Create packages.txt

Create a new file named packages.txt with:

libgl1
libglib2.0-0

Step 5: Commit Changes

  1. Commit all file changes
  2. Wait 2-3 minutes for automatic rebuild
  3. Check logs for success

:white_check_mark: Expected Result:

  • Storage usage: ~8-10GB (instead of 50GB+)
  • Model loads once and stays cached in /tmp
  • Fast inference after initial load

:white_check_mark: Solution 2: Use Persistent Storage (PAID OPTION)

If you need permanent storage for large files or databases, use HuggingFace’s persistent storage feature.

Step 1: Enable Persistent Storage

  1. Go to your Space Settings
  2. Scroll to “Persistent Storage” section
  3. Click “Add Persistent Storage”
  4. Select storage size: 20GB (minimum recommended)
  5. Click “Create” and confirm payment

Step 2: Configure Environment Variables

In Space Settings → Environment Variables, add:

Variable Value
HF_HOME /data/.huggingface
TORCH_HOME /data/torch_cache
HUB_DIR /data/torch_hub

Step 3: Update app.py for Persistent Storage

Add at the top of app.py:

import os

# Use persistent storage
os.environ['HF_HOME'] = '/data/.huggingface'
os.environ['TORCH_HOME'] = '/data/torch_cache'
os.environ['HUB_DIR'] = '/data/torch_hub'

# Create directories if they don't exist
os.makedirs('/data/.huggingface', exist_ok=True)
os.makedirs('/data/torch_cache', exist_ok=True)
os.makedirs('/data/torch_hub', exist_ok=True)

Step 4: Keep Optimized Requirements

Use the same requirements.txt from Solution 1:

streamlit
pillow
torch
torchvision
opencv-python-headless
numpy
pandas
pyyaml
requests
scipy
tqdm
matplotlib
seaborn
ultralytics

Step 5: Factory Reboot

  1. Go to Settings → Scroll to bottom
  2. Click “Factory reboot” button
  3. Wait 3-5 minutes for rebuild

:white_check_mark: Expected Result:

  • Files persist across restarts
  • No repeated downloads
  • Faster startup times
  • Better for production apps

:money_bag: Cost: ~$5-10/month depending on storage size


:counterclockwise_arrows_button: Solution 3: Factory Reboot (Quick Fix)

If you’ve made code changes but Space is still failing, try a clean rebuild.

When to Use Factory Reboot:

  • :white_check_mark: After updating app.py or requirements.txt
  • :white_check_mark: When Space is stuck in “Building” state
  • :white_check_mark: After storage errors
  • :white_check_mark: When cache seems corrupted

Step-by-Step Factory Reboot:

  1. Navigate to Settings

    • Go to your Space URL
    • Click :gear: Settings tab at the top
  2. Scroll to Bottom

    • Look for “Factory reboot” section
    • You’ll see a red warning message
  3. Click Factory Reboot

    • Click the “Factory reboot” button
    • Confirm the action in popup dialog
  4. Monitor Rebuild

    • Switch to “Logs” tab
    • Watch the build process (2-5 minutes)
    • Look for “Model loaded successfully!” message
  5. Check Application

    • Click “App” tab
    • Test your application
    • Upload an image to verify detection works

What Factory Reboot Does:

  • :wastebasket: Clears all temporary files and cache
  • :counterclockwise_arrows_button: Rebuilds Docker container from scratch
  • :package: Reinstalls all packages fresh
  • :rocket: Applies all code changes properly

:warning: Important Notes:

  • Factory reboot deletes all non-persistent data
  • Model will need to re-download (one time)
  • Takes 2-5 minutes to complete
  • Use this after making code fixes, not before

:bar_chart: Comparison Table

Feature Solution 1 (Free) Solution 2 (Paid) Solution 3 (Reboot)
Cost Free $5-10/month Free
Storage Type Temporary (/tmp) Persistent (/data) Clears everything
Data Persistence Lost on restart Kept forever Rebuilds fresh
Setup Complexity Medium Easy Very Easy
Best For Most use cases Production apps Troubleshooting
Storage Limit Fix :white_check_mark: Yes :white_check_mark: Yes :warning: Temporary
Implementation Time 10 minutes 5 minutes 2 minutes
Recurring Cost None Monthly None

:magnifying_glass_tilted_left: Troubleshooting Common Issues

Issue 1: “No module named ‘ultralytics’”

Solution:

# Add to requirements.txt
ultralytics

Issue 2: “Package ‘libgl1-mesa-glx’ has no installation candidate”

Solution:

# Update packages.txt to use new package name
libgl1
libglib2.0-0

Issue 3: Model still downloading to wrong location

Solution:

# Add BEFORE importing torch
import os
os.environ['TORCH_HOME'] = '/tmp/torch_cache'

# Then import
import torch
torch.hub.set_dir('/tmp/torch_hub')

Issue 4: Build fails with “exit code: 1”

Solution:

  1. Remove version numbers from requirements.txt
  2. Use simple package names only
  3. Factory reboot

Issue 5: Space runs but shows blank screen

Solution:

  1. Check logs for errors
  2. Verify best.pt file is in root directory
  3. Ensure file is not Git LFS pointer
  4. Factory reboot

:chart_increasing: Performance Metrics

Before Optimization:

  • :bar_chart: Storage Used: 50GB+ (Failed)
  • :stopwatch: Build Time: Failed / Timeout
  • :counterclockwise_arrows_button: Model Load: Every request
  • :floppy_disk: Cache Location: /root (permanent)
  • :package: PyTorch Size: ~4GB (full version)

After Optimization:

  • :bar_chart: Storage Used: ~8-10GB :white_check_mark:
  • :stopwatch: Build Time: 2-3 minutes :white_check_mark:
  • :counterclockwise_arrows_button: Model Load: Once per session :white_check_mark:
  • :floppy_disk: Cache Location: /tmp (temporary) :white_check_mark:
  • :package: PyTorch Size: ~2GB (optimized) :white_check_mark:

Recommended Approach

For Free Tier Users (Most People):

  1. :white_check_mark: Use Solution 1 (redirect cache to /tmp)
  2. :white_check_mark: Optimize requirements.txt
  3. :white_check_mark: Use @st.cache_resource for model loading
  4. :white_check_mark: Factory reboot after changes

For Production Apps:

  1. :white_check_mark: Use Solution 2 (persistent storage)
  2. :white_check_mark: Set up proper environment variables
  3. :white_check_mark: Implement error handling and logging
  4. :white_check_mark: Monitor storage usage regularly

For Quick Testing:

  1. :white_check_mark: Make code changes
  2. :white_check_mark: Use Solution 3 (factory reboot)
  3. :white_check_mark: Check logs for errors
  4. :white_check_mark: Test functionality

Quick Start Checklist

Immediate Actions (5 minutes):

  • Update app.py with cache redirect code
  • Replace requirements.txt with optimized version
  • Create packages.txt with correct package names
  • Commit all changes
  • Factory reboot your Space

Verification Steps:

  • Check build logs for success
  • Verify “Model loaded successfully!” message
  • Test image upload and detection
  • Monitor storage usage in Space settings
  • Confirm app loads within 30 seconds

Optional Enhancements:

  • Create .streamlit/config.toml for UI customization
  • Add example images (< 500KB each)
  • Implement download button for results
  • Add error handling and user feedback
  • Set up persistent storage (if needed)

Key Takeaways

  1. Always redirect torch cache to /tmp in HuggingFace Spaces
  2. Use @st.cache_resource to prevent repeated model loading
  3. Set force_reload=False in torch.hub.load()
  4. Use minimal requirements.txt without version pins
  5. Factory reboot after making significant changes
  6. Monitor logs during build process
  7. Test thoroughly after deployment
  8. Consider persistent storage for production apps

:sos_button: Still Having Issues?

If you’re still experiencing problems after following all solutions:

  1. Check Logs: Click “Logs” tab to see detailed error messages
  2. Verify Files: Ensure all files are updated correctly
  3. Try Multiple Reboots: Sometimes 2-3 factory reboots help
  4. Clear Browser Cache: Refresh with Ctrl+Shift+R
  5. Contact Support: Share logs with HuggingFace support team
  6. Community Help: Post on HuggingFace forums with error details

:white_check_mark: Success Indicators

Your Space is working correctly when you see:

:white_check_mark: Build completes in 2-5 minutes
:white_check_mark: “Model loaded successfully!” in logs
:white_check_mark: App loads without errors
:white_check_mark: Storage usage < 10GB
:white_check_mark: Image detection works smoothly
:white_check_mark: No “evicted” or “storage limit” errors


:tada: Conclusion

The 50GB storage limit error is easily fixable by:

  1. Redirecting cache to temporary storage
  2. Optimizing dependencies
  3. Using proper Streamlit caching

Solution 1 (free) works for 95% of use cases. Use Solution 2 (paid) only if you need persistent storage. Use Solution 3 (factory reboot) whenever you make code changes.

Follow the step-by-step guide above, and your YOLOv5 Seatbelt Detection Space will run smoothly! :rocket:


Good luck with your deployment! :flexed_biceps:

Last Updated: October 2025

1 Like