Issue with ALLaM-7B Model in Inference API - Size Limitation Error

mehdidjemame · March 7, 2025, 10:30am

Hi everyone,

I’m experiencing an issue with the ALLaM-7B model when trying to use it through the Hugging Face Inference API. Despite the model being listed as supported in the HF Inference API documentation, I’m receiving an error message indicating that the model is too large to be loaded automatically.

Here’s the error message I’m seeing:

Cannot access content at: https://router.huggingface.co/hf-inference/models/ALLaM-AI/ALLaM-7B-Instruct-preview/v1/chat/completions. Make sure your token has the correct permissions. The model ALLaM-AI/ALLaM-7B-Instruct-preview is too large to be loaded automatically (14GB > 10GB).

Has anyone else encountered this issue? Are there any known workarounds or solutions for using the ALLaM-7B model with the Inference API? I’d appreciate any guidance or advice on how to resolve this issue.

Thanks

Model details:

Model name: ALLaM-7B
Endpoint: https://api-inference.huggingface.co/models/ALLaM-AI/ALLaM-7B-Instruct-preview
Error message: Size limitation error (14GB > 10GB)

John6666 · March 7, 2025, 6:16pm

Models larger than 10GB will not be available through the Serverless Inference API unless Hugging Face gives individual permission.

Also, it seems that major changes are currently being made to the Serverless Inference API as a whole, so it is not known what will happen in the future…

If you really want to use it online, consider using Spaces, the Endpoint API (dedicated), or Colab Free.

This model is not currently available via any of the supported Inference Providers.

Topic		Replies	Views
List models accessible via InferenceClient? Inference Endpoints on the Hub	1	70	April 9, 2025
Too large to be loaded automatically (16GB > 10GB) issue with QWEN 2.5 VL 7B Inference Endpoints on the Hub	2	102	April 15, 2025
Inference API stopped working Inference Endpoints on the Hub	50	3582	June 8, 2025
How to configure a model for Inference API? Models	0	384	May 23, 2024
Inference service for large models, such as Vicuna 13b Beginners	0	1427	May 5, 2023

Issue with ALLaM-7B Model in Inference API - Size Limitation Error

Related topics