Too large to be loaded automatically (16GB > 10GB) issue with QWEN 2.5 VL 7B

got the following error
Error processing file Contract F 506 b 5 end date 14042025 (1).pdf: Error code: 403 - {‘error’: ‘The model Qwen/Qwen2.5-VL-7B-Instruct is too large to be loaded automatically (16GB > 10GB).’}

while working on
Qwen2.5-VL-7B-Instruct for an OCR task, first i was working on inference providers via hugging face (HF inference API) and got this so i started to run a dedicated endpoint with the suggested GPU and the issue still persists so i upgraded to higher resources and still the same knowing that this whole thing worked till yesterday and i was testing this file yesterday and got the expected response so any help to solve this.

1 Like

Same here. Maybe related to this incident.

i think, for Qwen they have removed it from the hugging face serverless calls, and now available only using dedicated server.

1 Like