How to work with meta tensors?

atyshka · October 30, 2023, 5:56pm

I’m trying to quantize a CodeLlama model to int8 using smoothquant. I have the circular issue that I need to calibrate the model in fp16, but I want to quantize it in the first place because it won’t fit in GPU memory.

I’m using this code to quantize my model, and with device_map=“auto” it’s taking advantage of accelerate to offload parts of the model to host memory. The model calibrates just fine, but I don’t know what to do later when the quantized weights are computed. On line 78, which runs this code: scales[layer_name_qkv]["x"] = scales[layer_name_q]["x"] / smoother I get this error: RuntimeError: Tensor on device meta is not on the expected device cuda:0!

That makes sense, but my question is how to move the tensor from ‘meta’ to ‘cuda:0’? The to() call doesn’t work because meta tensors have no data. It seems that accelerate/transformers automatically handles moving the data around with module hooks, but how can I achieve what I want here when I’m working with the raw tensors?

jpcorb20 · April 16, 2025, 7:23pm

any solution to this?

Topic		Replies	Views
Move model with device_map="balanced" to CPU 🤗Transformers	1	6223	February 5, 2024
Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! 🤗Accelerate	1	890	September 20, 2023
Fine tune "meta-llama/Llama-2-7b-hf" Bug:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward) Beginners	15	182	December 6, 2024
Load_in_8bit requires device_map but also does not support it 🤗Transformers	0	2779	December 19, 2022
Help with merging LoRA to base model Beginners	1	39	April 23, 2025

How to work with meta tensors?

Related topics