I am using python and Lets say, df as follows
Category RAM
0 LAMAG12 60
1 KAMAG32 40
2 JAMAG89 50
3 QWMAG90 30
4 PAMAG54 90
I will ask que like
- what is sum of LAMAG12 and PAMAG54 RAM ?
so it should give ans: 150
- List out all categories whose RAM is greater than 50 ?
SO LIKE THIS I CAN ASK ANY QUESTIONS
How to create such model ?
Which transformer model is best here ?
is tokenization imp here ?
How to train this df to ask questions ?
1 Like
You don’t need a large transformer to start.
This task is well suited to a small fine-tuned model on structured question/answer pairs from your data.
Simplest Working Strategy:
Convert the DataFrame into a flat JSON/text format
[
{“Category”: “LAMAG12”, “RAM”: 60},
{“Category”: “KAMAG32”, “RAM”: 40},
…
]
Train a small language model like phi-1.5 or TinyLlama using Q&A pairs:
Q: What is the sum of LAMAG12 and PAMAG54 RAM?
A: 150
Q: List all categories with RAM greater than 50.
A: LAMAG12, PAMAG54
Use sentencepiece or BPE tokenization (yes, it matters clean tokens = better understanding)
Optionally use LoRA for fast adapter training on top of a pretrained model
Model Suggestion:
Phi-1.5 for local or offline
mistral-7b-instruct if you need deeper logic
Avoid GPT-2/3 if you're aiming for symbolic/deterministic accuracy
Powered by Triskel Data Deterministic AI
Your answers don’t need to guess just reflect.
1 Like
If you are training an LLM from scratch, these may also be useful references.
@Pimpcat-AU Thank you for your answer
I tried but failed to train and save model
Can you please provide Python code snippet with phi-1.5 ?
1 Like