How to run Trainer + vLLM on Quantized LLMs?

Hi everyone, I am a beginner of the Mini-R1, trying to play the Countdown task with GRPOTrainer and vLLM, however, always fail whenever applying quantization.

The code works well for:

accelerate + DeepSpeed + + Qwen2.5 + vLLM
accelerate + + PEFT + + Qwen2.5 + vLLM
accelerate + + PEFT + Quantize 4bit + Qwen2.5

When “accelerate + PEFT + Quantize 4bi + Qwen2.5 + vLLM”, I always get error:
[rank0]: File "/opt/Miniconda3/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 1008, in weight_loader [rank0]: assert param_data.shape == loaded_weight.shape

Anyone can help me with it, please?

Any tutorial to make SFTTrainer with vLLM on Quantized LLMs, please?

1 Like

It seems it doesn’t supported for now…

And GRPO vLLM related issues:

2 Likes

Any tutorial teaching how to modify the configuration file can also help, please.

I saw that the vLLM official website said they support Quantized LLM with PEFT, however, wasn’t able to find any tutorial teaching how to modify the existing Trainer. :sweat_smile:

Even not with GRPOTrainer, any tutorial teaching how to make the SFTTrainer work can also help, please

2 Likes

I can’t find a tutorial or reference either…:thinking:
Is it possible that the only way to train is with Transformers or other libraries…?
If you’re looking for speed, you could check out unsloth’s training tools.

Unfortunately, unsloth does not seem to offer free service for DDP. :melting_face:

1 Like