Why there are chat and instruct models for 13B parameters?

It is quite interesting that there are chat and instruct models on the same structure! Why can’t we train one model that can do both? Is that because 13B is relatively small so we want the model to be more specific? For such a design choice, is a bigger model (say, 33B) also subject to such a decision?

On the application end, since the 13B models are specified on a certain task, can I use multiple models to perform a bigger task? Or is it better to use a bigger model to handle it?

Thank you in advance!

1 Like