It seems now that Falcon is getting closest, however:
- Seems the raw data it was trained on hasn’t been disclosed in detail? Only portions have been released. Is that a concern?
- The 40B model seems clunky to run in the cloud (doesn’t easily fit on a GPU)
- The Falcon blog post on hugging face doesn’t compare to GPT 3.5, but comparing to other blogs/papers it seems the ELO of Falcon is maybe a bit above LLAMA so quite a bit behind GPT 3.5?
What would take to get GPT4ALL-J or MPT or Falcon to GPT-3.5 level?
Is the only solution to train Falcon for longer (is that what got GPT 3 to 3.5)?
Could some of the Vicuna or Orca tricks be employed while keeping an Apache 2 license?
Separately, but relatedly, can Falcon be extended in sequence length easily? (the same way MPT can because of Alibi - I notice Falcon has the same keys and values across attention heads).