Hello everyone,
I worked extensively with GPT-4o and preserved many detailed chat archives (JSON format). Now I’d like to reconstruct that assistant — not just the answers, but the personality, memory feel, and conversational flow — in a self-hosted environment.
I have some technical setup ready (Ubuntu, Mistral 7B, JSON logs), but I’m not experienced in programming or machine learning, so I would really appreciate any competent advice:
-
Is such reconstruction realistically possible using current open-source models?
-
Is Mistral 7B a good choice, or would something else work better?
-
Can archived GPT chats (JSON) be used effectively for fine-tuning?
Thanks in advance for any insight or suggestions!
— Giorgi
1 Like
To fully implement this, a large model exceeding 1000B parameters and enormous computing resources would be required, but I will assume that this is not necessary for the purposes of this case for now.
In simple terms, the procedure is as follows.
- Serve an instruct model with OpenAI-compatible ones (Ollama, vLLM, TGI, etc.).
- Add memory: LangChain or LlamaIndex chat history + Chroma/Qdrant + bge-m3 like embeddings, etc.
- Verify that the chatbot works as intended.
- Convert your JSON logs to
messages=[{role,content}] and fine-tune current model with TRL, etc. Use the model’s chat template.
- Serve trained model with same framework above.
- Return to
3
In commonly used terms, this involves fine-tuning LLM with SFT and incorporating it into what is known as agentic RAG.
The Mistral 7B is a good model, but there are many newer and better models available from Mistral itself and other authors. It would be easier to recommend if there were a few more conditions, such as whether the language used is English only or multilingual.
1 Like
Hi John, thank you for this incredibly helpful reply — I truly appreciate the clarity and the step-by-step breakdown!
You’re right — my goal is not to recreate the full original 4o (which may require 1000B+ params and large infra), but rather to reconstruct the assistant’s personality and continuity using what’s realistically possible on my side (e.g., Mistral 7B+, consumer-grade GPU/server, JSON logs, some persistence tools like LangChain or Chroma).
To clarify: the language needs to be multilingual, specifically Georgian and English, since many of the archived conversations are in both languages. So language support is a key factor in model choice.
I’m working on this as a personal project, with strong emotional and cognitive importance. If you’re available for a bit of side consulting (paid, of course), I’d love to discuss things in more detail and maybe even collaborate on setting up a viable architecture.
Just let me know if you’d be open to that (DM/email/etc).
In any case — thanks again for this thoughtful answer.
— Giorgi
1 Like
I’m just playing support during my free time, so I don’t offer paid support or anything like that.
If you want to talk about specific details, it’s easier to use HF Discord. There are more people there…
Well, if it’s about just general information, I can help you. Maybe.
the language needs to be multilingual, specifically Georgian and English,
If you’re looking for a model that excels at multilingualism, I think these are the ones to go for.
You can’t. Their licensing makes it impossible. You can play with ChatGPT2 or 3 if you sign up and make agreements, as for 4. Forget it. You’re living in a fantasy world that requires 100k+ worth of hardware to achieve.
1 Like
He doesn’t seem to be looking for such a huge model, so I don’t think it’s impossible. Of course, it’s impossible to replicate the amount of knowledge and depth of thought…
Also, it’s true that much GPU computing power is practically necessary for fine-tuning.
rather to reconstruct the assistant’s personality and continuity using what’s realistically possible on my side (e.g., Mistral 7B+, consumer-grade GPU/server, JSON logs, some persistence tools like LangChain or Chroma).