🚨 Inference Failure on HF Spaces: Keras model gives incorrect predictions despite identical weights and inputs (works on Colab and locally)

Description:

I’ve created a Keras-based neural network model (.keras file) that predicts stock closing prices based on [Open, High, Low, Close, Volume]. The model:

  • Works perfectly on my local machine (MacBook M4, CPU+GPU via tensorflow-metal)
  • Works correctly on Google Colab (CPU-only)
  • Fails on Hugging Face Spaces, producing consistently wrong predictions with the same input

Reproducible Test:

Model Input: [0.5, 0.5, 0.5, 0.5, 0.5]
Expected Prediction (unscaled): ~78.33
HF Prediction (unscaled): 23.62


Environment Info:

Environment TF Version sklearn Prediction
Local (Mac M4) 2.16.2 1.6.1 :white_check_mark: correct
Colab (CPU) 2.16.2 1.6.1 :white_check_mark: correct
HF Spaces 2.16.2 1.6.1 :cross_mark: incorrect

Files:

  • Model file: NN_model.keras (verified via MD5 hash)
  • Scalers: scaler_X.pkl, scaler_y.pkl (min/max match)
  • Sample input: [0.5] * 5
  • First-layer weight mean: -0.010000 (identical in all environments)
  • Output discrepancy is repeatable and deterministic

Additional Notes:

  • I have also tried .h5 format and SavedModel export via model.export(). All formats give correct predictions except on Hugging Face Spaces.
  • Inputs, scalers, and model all log correctly inside Spaces — it’s only the .predict() result that is wrong.
  • This behavior suggests either a backend-level deserialization bug, or numerical inconsistency tied to the Hugging Face container environment.

What I’d Like to Know:

  • Is there any known limitation or incompatibility when running Keras .keras or .h5 models inside Spaces?
  • Is the current HF Spaces CPU container running TensorFlow with any non-standard optimizations (e.g., oneDNN, XLA, etc.) that might explain the discrepancy?

Reproduction Link (Optional):

I can provide a minimal public Space demonstrating this issue with logging upon request.

Thank you — I’d love to help resolve this and make Spaces more robust for ML deployments!

1 Like

If there is a difference compared to Colab, it is the different versions of the serialization library that are suspicious in the dependency.

Would it happen even if it was fixed to a slightly older version?

huggingface_hub

to

huggingface_hub==0.25.2