Small Decoder-only model < 1B parameters

Also asked on Reddit