I fine-tuned LLaMa 2 to test its query classification quality and after I saved my final model, I converted the
bin files to
safetensors and used it in TGI. But I noticed I am getting completely different results compared to just using
model.generate(). Note that all other parameters like top_p, top_k, etc. are the same and temperature is set to a small positive value of 0.01. After a lot of testing, I am confident that
safetensors are the only variable between the two.
Is this a known fact or a bug in conversion? I used the following code to convert: