I attempted to utilize the bloomz-3b model and fine-tune it using the peft library from Hugging Face, specifically with LoRA. After successfully merging the LoRA weights with the main model, I encountered an issue when uploading the model back to Hugging Face. Surprisingly, the uploaded model size inflated to 12 GB instead of the expected 6 GB, which surpasses the size limit for the inference API on Hugging Face. Additionally, when attempting to use the model with the Langchain Hugging Face hub, it times out due to its excessive size.How do I proceed?