Guide/Tutorial to write an inference endpoint for custom models

Hi @DisgustingOzil, we had the error due to incorrect procedure, instead of merging the adapter weights with the base model, we just uploaded the adapter weights directly and tried to deploy that instead.

After reading the following , Config.json is not saving after finetuning Llama 2 - #5 by hemanthkumar23, by using the merge_and_unload() function, we were able to upload the complete model that was deployable. Unfortunately I don’t know how to reduce the size of the merged model. Hope this helps!