How to work with meta tensors?

Iā€™m trying to quantize a CodeLlama model to int8 using smoothquant. I have the circular issue that I need to calibrate the model in fp16, but I want to quantize it in the first place because it wonā€™t fit in GPU memory.

Iā€™m using this code to quantize my model, and with device_map=ā€œautoā€ itā€™s taking advantage of accelerate to offload parts of the model to host memory. The model calibrates just fine, but I donā€™t know what to do later when the quantized weights are computed. On line 78, which runs this code: scales[layer_name_qkv]["x"] = scales[layer_name_q]["x"] / smoother I get this error: RuntimeError: Tensor on device meta is not on the expected device cuda:0!

That makes sense, but my question is how to move the tensor from ā€˜metaā€™ to ā€˜cuda:0ā€™? The to() call doesnā€™t work because meta tensors have no data. It seems that accelerate/transformers automatically handles moving the data around with module hooks, but how can I achieve what I want here when Iā€™m working with the raw tensors?