Having trouble reproducing alpaca-lora results

We’ve tried reproducing the results from alpaca_lora and are getting weird results:

  1. the official weights work fine for the 7b model
  2. the adapters we have when we finetune the same model using the official finetuning script result in adapters that output nonsensical results when used with the official generation script.

I suspect something is going wrong with the finetuning: hyperparameter reproduction, RNG seeds, dataset formation and formatting, or something else that’s hard to catch.
Any input from the alpaca-lora people or PEFT/LoRA developers would be welcome