How to work with meta tensors?

I’m trying to quantize a CodeLlama model to int8 using smoothquant. I have the circular issue that I need to calibrate the model in fp16, but I want to quantize it in the first place because it won’t fit in GPU memory.

I’m using this code to quantize my model, and with device_map=ā€œautoā€ it’s taking advantage of accelerate to offload parts of the model to host memory. The model calibrates just fine, but I don’t know what to do later when the quantized weights are computed. On line 78, which runs this code: scales[layer_name_qkv]["x"] = scales[layer_name_q]["x"] / smoother I get this error: RuntimeError: Tensor on device meta is not on the expected device cuda:0!

That makes sense, but my question is how to move the tensor from ā€˜meta’ to ā€˜cuda:0’? The to() call doesn’t work because meta tensors have no data. It seems that accelerate/transformers automatically handles moving the data around with module hooks, but how can I achieve what I want here when I’m working with the raw tensors?

3 Likes

any solution to this?

1 Like