Too large to be loaded automatically (16GB > 10GB) issue with QWEN 2.5 VL 7B

Injaz · April 14, 2025, 3:15pm

got the following error
Error processing file Contract F 506 b 5 end date 14042025 (1).pdf: Error code: 403 - {‘error’: ‘The model Qwen/Qwen2.5-VL-7B-Instruct is too large to be loaded automatically (16GB > 10GB).’}

while working on
Qwen2.5-VL-7B-Instruct for an OCR task, first i was working on inference providers via hugging face (HF inference API) and got this so i started to run a dedicated endpoint with the suggested GPU and the issue still persists so i upgraded to higher resources and still the same knowing that this whole thing worked till yesterday and i was testing this file yesterday and got the expected response so any help to solve this.

John6666 · April 15, 2025, 2:41am

Same here. Maybe related to this incident.

Injaz · April 15, 2025, 1:54pm

i think, for Qwen they have removed it from the hugging face serverless calls, and now available only using dedicated server.

Topic		Replies	Views
The model mistralai/Mistral-7B-Instruct-v0.1 is too large to be loaded automatically (14GB > 10GB) Models	2	179	April 15, 2025
Issue with ALLaM-7B Model in Inference API - Size Limitation Error Inference Endpoints on the Hub	1	56	March 7, 2025
404 - "{\"error\":\"Model XLabs-AI/flux-RealismLora does not exist\"}" Models	9	369	April 16, 2025
Fail to deploy newer models Inference Endpoints on the Hub	4	198	February 5, 2025
Meta-llama / Meta-Llama-3-70B-Instruct is not available as a serverless API Models	10	1596	September 28, 2024

Too large to be loaded automatically (16GB > 10GB) issue with QWEN 2.5 VL 7B

Related topics