What would it take to get GPT-3.5 turbo performance on an open source model?

It seems now that Falcon is getting closest, however:

  • Seems the raw data it was trained on hasn’t been disclosed in detail? Only portions have been released. Is that a concern?
  • The 40B model seems clunky to run in the cloud (doesn’t easily fit on a GPU)
  • The Falcon blog post on hugging face doesn’t compare to GPT 3.5, but comparing to other blogs/papers it seems the ELO of Falcon is maybe a bit above LLAMA so quite a bit behind GPT 3.5?

What would take to get GPT4ALL-J or MPT or Falcon to GPT-3.5 level?

Is the only solution to train Falcon for longer (is that what got GPT 3 to 3.5)?

Could some of the Vicuna or Orca tricks be employed while keeping an Apache 2 license?

Separately, but relatedly, can Falcon be extended in sequence length easily? (the same way MPT can because of Alibi - I notice Falcon has the same keys and values across attention heads).

1 Like