What would it take to get GPT-3.5 turbo performance on an open source model?

RonanMcGovern · June 24, 2023, 7:09pm

It seems now that Falcon is getting closest, however:

Seems the raw data it was trained on hasn’t been disclosed in detail? Only portions have been released. Is that a concern?
The 40B model seems clunky to run in the cloud (doesn’t easily fit on a GPU)
The Falcon blog post on hugging face doesn’t compare to GPT 3.5, but comparing to other blogs/papers it seems the ELO of Falcon is maybe a bit above LLAMA so quite a bit behind GPT 3.5?

What would take to get GPT4ALL-J or MPT or Falcon to GPT-3.5 level?

Is the only solution to train Falcon for longer (is that what got GPT 3 to 3.5)?

Could some of the Vicuna or Orca tricks be employed while keeping an Apache 2 license?

Separately, but relatedly, can Falcon be extended in sequence length easily? (the same way MPT can because of Alibi - I notice Falcon has the same keys and values across attention heads).

Topic		Replies	Views
Recommend an instance for MPT-7B and MPT-30B inference Amazon SageMaker	2	405	July 19, 2023
Finetuing GPT model? 🤗Transformers	2	353	August 29, 2021
How long it takes to train Falcon 7B model using RTX 4090 GPU? Models	3	4852	February 18, 2024
Time and memory taken to fine-tune GPT-2 Models	0	771	February 22, 2021
Does anyone have working code for training T5-11B on multi-gpu? DeepSpeed	4	1048	March 30, 2023

What would it take to get GPT-3.5 turbo performance on an open source model?

Related topics