Issue with ALLaM-7B Model in Inference API - Size Limitation Error

Hi everyone,

I’m experiencing an issue with the ALLaM-7B model when trying to use it through the Hugging Face Inference API. Despite the model being listed as supported in the HF Inference API documentation, I’m receiving an error message indicating that the model is too large to be loaded automatically.

Here’s the error message I’m seeing:

Cannot access content at: https://router.huggingface.co/hf-inference/models/ALLaM-AI/ALLaM-7B-Instruct-preview/v1/chat/completions. Make sure your token has the correct permissions. The model ALLaM-AI/ALLaM-7B-Instruct-preview is too large to be loaded automatically (14GB > 10GB).

Has anyone else encountered this issue? Are there any known workarounds or solutions for using the ALLaM-7B model with the Inference API? I’d appreciate any guidance or advice on how to resolve this issue.

Thanks

Model details:

  • Model name: ALLaM-7B
  • Endpoint: https://api-inference.huggingface.co/models/ALLaM-AI/ALLaM-7B-Instruct-preview
  • Error message: Size limitation error (14GB > 10GB)
1 Like

Models larger than 10GB will not be available through the Serverless Inference API unless Hugging Face gives individual permission.

Also, it seems that major changes are currently being made to the Serverless Inference API as a whole, so it is not known what will happen in the future…

If you really want to use it online, consider using Spaces, the Endpoint API (dedicated), or Colab Free.

This model is not currently available via any of the supported Inference Providers.