Hi everyone,
I’m experiencing an issue with the ALLaM-7B model when trying to use it through the Hugging Face Inference API. Despite the model being listed as supported in the HF Inference API documentation, I’m receiving an error message indicating that the model is too large to be loaded automatically.
Here’s the error message I’m seeing:
Cannot access content at: https://router.huggingface.co/hf-inference/models/ALLaM-AI/ALLaM-7B-Instruct-preview/v1/chat/completions. Make sure your token has the correct permissions. The model ALLaM-AI/ALLaM-7B-Instruct-preview is too large to be loaded automatically (14GB > 10GB).
Has anyone else encountered this issue? Are there any known workarounds or solutions for using the ALLaM-7B model with the Inference API? I’d appreciate any guidance or advice on how to resolve this issue.
Thanks
Model details:
- Model name: ALLaM-7B
- Endpoint:
https://api-inference.huggingface.co/models/ALLaM-AI/ALLaM-7B-Instruct-preview
- Error message: Size limitation error (14GB > 10GB)